Coding Assignment 4 : Precondition Inference
Network | network | mining代做 | 代做project | Python | assignment作业 | lab – 这是利用Python进行训练的代写, 对Python的流程进行训练解析, 是比较典型的Network/network/mining/Python等代写方向, 这是值得参考的lab代写的题目
Note that this has already been extended for an extra week from the original deadline of April 13. No further extension will be considered.
This assignment counts for 10% of the course grade.
Assignments turned in after the deadline but before the end of April 23 are subject to a 20% grade penalty.
In this assignment, you will be given a set of sentence pairs. For each pair of sentence, the first one is considered as a precondition, while the second one is a statement. The goal is to develop a natural language reasoner to decide whether the precondition will enable or disable the statement.
You are free to experiment with any methods of representation, encoding , including any pre- trained models. Submit your prediction on the unlabeled test set. We will compare your prediction with the ground-truth labels of the test set on our side. Grading will be based on the ranking of your submitted prediction among all of those in the class.
( Please make sure to read the Q&A part )
CSCI 544 Spring 2022 Home project Assignments Schedule USC.edu
Data and Jupyter Notebook
A compressed ZIP file is to be released on Blackboard when the assignment is available. The uncompressed archive contain the following files: main.ipynb: The Python 3 Jupyter notebook you will need to fill in with your training and inference code, and predict the results. data folder: pnli_train.tsv: labeled training data in csv format. Each line is tuple in the form of (precondition, statement, label). label=1 means "enable" and label=0 means "disable". pnli_dev.tsv: labeled dev data. pnli_test_unlabeled.tsv: unlabled test data. Each line is a pair of precondition and statement. upload_document.txt: the documentation file where you need to fill in the blanks to describe your method.
This dataset is internal to the CSCI 544 class in Spring 2022. You are not allowed to distribute the dataset to whoever is not a part of this class. Any such unallowed distribution is is subject to severe penalty (including getting zero credit).
Programs and Models
We again provide a jupyter notebook which provides the starting code for reading the data, and the end part to generate "upload_predictions.txt". You need to fill in the "Main Code Body", and put the 4850 predictioned labels into the results list.
Restrictions. Your method needs to be implemented using python 3. You are free to use any python package (e.g., pytorch, Huggingface, AllenNLP, etc.). You are free to include any pre-trained models (any versions of Transformer language models, pre-trained NLI models, pre-trained QA models, etc.). However, only free models are allowed (hence, you cannot prompt GPT-3). You can consider doing your experiment on Google Co lab (which provides a free student membership), your own machine, or any computating resources that are available to you.
This assignment requests submitting three files (DO NOT CHANGE the filenames!) :
- upload_predictions.txt: The predictions of your model on the 4850 unlabeled test set of sentence pairs. This file should have exactly 4850 lines, every line is either 0 or 1. (submit this on Vocareum )
- upload_document.txt: Fill in the blanks of that file to accordingly describe how your model is developed. (submit this on Blackboard )
- main.ipynb: This Jupyter notebook already contains the beginning part to read the data, and the end part to generate "upload_predictions.txt". You need to fill in the "Main Code Body" (submit this on Blackboard ).
Multiple submissions are allowed; only the final submission will be graded. Do not include any other files in your submission. You are encouraged to submit early and often in order to iron out any problems, especially issues with the format of the final output.
We will again use a leaderboard protocol : the ground-truth labels on the test set are not released to you, and we will compare your prediction with them to calculate the accuracy of your prediction.
The prediction of your predictions will be measured automatically; failure to format your output correctly may result in very low scores, which will not be changed.
For full credit, make sure to submit your assignment well before the deadline. The time of submission recorded by the system is the time used for deter mining late penalties. If your submission is received late, whatever the reason (including equipment failure and network latencies or outages), it will incur a late penalty.
We will again use a leaderboard protocol. After the due date, we will calculate the accuracy of your submitted prediction. Then your grade will be decided based on the standing of your submission among all valid ones. Specifically, the top 20% valid submissions will get a score of 10, then the next 20% will get a score of 9, etc. The bottomline score of a valid submission will be 6 (this is by-default. We may choose to curve the scores depend on the performance distribution). However, your submission will be considered invalid and will result in a score lower than the bottomline or of zero if one of the following happens:
The accuracy of your submission is very much close to a trivial baseline performance (e.g., random guessing). The code in your submitted jupyter notebook obviously cannot reproduce your predictions. You haven't submitted the documentation file, so we don't know how your results can be reproduced; or, based on the submitted document, we find factual mistakes of your method that should cause it to not work (e.g., giving random or nonsensical predictions). Your submission has been redflaged by plagarism checking.
Q & A
How many labels are there?
This is a binary classification task. Only two labels are used: 0 and 1, meaning "disabling" and "enabling".
Are the test data from the same distribution of the training/dev data?
Yes, they were splits from the same dataset.
Is the dataset balanced?
How do I know if my method is well-performing or not?
You can try to tell it by the dev set performance.
What’s the evaluation metric?
Can you suggest some methods to try out?
The most well-aligned formulation of this task is NLI. You can definitely train an NLI model from scratch using the training data. Besides, those are some directions you can consider: Fine-tuning a pre-trained NLI model. Fine-tuning a QA model (and treat this task as a binary QA task). Fine-tuning a pre-trained LM as Next Sentence Prediction (NSP). Generative data augmentation. Incorporating external knowledge (e.g., from a knowledge base like conceptnet). etc.
Is there a single best solution?
Again, this assignment is an open-ended problem. So there is hardly a single best solution (don’t ever try to ask TAs about it: they do not know it either).