report代做 | project | Statistic | assignment代做 – Midterm Project

SS22-STT-481-001 – Capstone in Statistic

report代做 | project | Statistic | assignment代做 – 这道题目是Statistic代写任务, 涵盖了Statistic等代做方面, 这是值得参考的assignment代写的题目

report代写 代做report

Hide  assignment Information

Instructions

SS22-STT-481-001 – Capstone in Statistics (W)

Assignments Midterm Project

(recording of the instructions can be found here)

Instructions:

1. The goal of this  project is to predict the final price of each home, based on 79 explanatory
variables that describe (almost) every aspect of residential homes in Ames, Iowa. See more
details on https://www.kaggle.com/c/house-prices-advanced-regression-techniques.
2. The prediction performance won't be evaluated in the second midterm.
3. You need to use the dataset in the final project (the housing dataset on Kaggle) to perform the
methods that we've learned and we are going to learn in Chapter 6 (including KNN, linear
regression and model selection). Let me know if you don't know how to download the dataset
or submit your result.
4. DO NOT use caret package. If your results are based on any functions from the caret
package, they will be subject to zero credit.
5. The key is to get "excellent" score:
Comment on what you see in your model fitting and prediction results. Don't just show
your R code and the output without any comments.
Read the lecture notes carefully if you don't have time to read the textbook.
6. The  report format is similar to your homework. Please use Rmarkdown to do your report if you
can but not required. The report should be either pdf, or  html files.
7. Submit your report via D2L by the deadline 03/30/2022 before the beginning of the class.
8. This is a hard deadline. Late submission is NOT acceptable.
9. This project should be done on your own. Not a team project.
10. I will randomly choose about 6-8 students to present your project on 03/30/2022 in class.
Everyone has 5 minutes to present your findings. No need to prepare for slides. Your
presentation won't be evaluated, so just use your report that you submit.
11. You need to attend the class on 03/30/2022 (virtually). If you were chosen to present on
03/30/2022 in class but you didn't show up, you will get zero credit in your midterm project.
The random assignment will be done at the beginning of the class on 03/30/2022.
12. If you need any accommodation for the class on 03/30/2022 (e.g., sick), please let me know by
03/30/2022.

The evaluation will be based on the following items:

1. KNN Method* (20%)
2. Linear Regression (20%)
3. Subset Selection (20%)
4. Shrinkage Methods (20%)
5. Estimated Test Errors (CV estimate) and True Test Errors** of the models in Items 1-4. (20%)
6. Bonus : Preprocessing*** (10%)
  • KNN method: hint: FNN package knn.reg function.

**True Test Errors: you need to submit your results to Kaggle to see the true test errors. Take screenshots of all the results you submitted, along with the description indicating which method was

Submit Cancel

used. See below as an example, where I submitted lasso and knn methods. Attach them to your report or upload them to the same folder along with your report. Failing to do this will lose partial credits. Here is the video of how to submit your predictions to Kaggle.

*** Preprocessing: If you want to do preprocessing, please read the description of the predictors very carefully, then you will know how to deal with those missing values. If you don’t want to do preprocessing or you don’t need the bonus, then you can use the datasets I attached here. The dataset has no missing values and I removed all the categorical variables. If you use the data, for the item "Preprocessing" you will get zero credit out of 10%.

/content/enforced/683969-FS18-STT-481-001-97MAE4-EL-32-798/train_new.csv

/content/enforced/683969-FS18-STT-481-001-97MAE4-EL-32-798/test_new.csv

Detailed grading items:

1. KNN Method (20%)
Explain how you choose the tuning parameter k
Perform prediction with the k you chose
2. Linear Regression (20%)
Fit a linear regression model using all the predictors except Id
Residual diagnostics
What are the remedies if the model assumptions are violated? After these remedies,
check the residual diagnostics again. Do they look better?
Explain some of the most significant coefficients.
Perform prediction.
3. Subset Selection (20%)
Which subset selection method do you use? Best subset, forward stepwise, or
backward stepwise? And explain why you choose this method.
Which criterion do you use for selecting your model? Cp, BIC, Adjusted R-squared, or
CV? And explain why you choose this criterion.
Explain some of the coefficients in your selected model.
Perform prediction.
4. Shrinkage Methods (20%)
Do both ridge regression and lasso.
How to choose the tuning parameters for both methods.
Explain which one can do a better job in terms of model interpretation.
Explain the coefficients in the fitted model for both methods.
Perform prediction for both methods.
5. Estimated Test Errors (CV estimate) and True Test Errors of the models in Items 1-4. (20%)
Compute the cross-validation (CV) mean squared errors (MSEs) for each method.
Submit your predictions to Kaggle and see the true test mean squared error for each
method.
Compare the CV MSEs and the true test MSEs. Do they look similar? Note: the true test
MSEs are computed by the root mean squared errors (RMSEs) between the logarithm of
the predicted value and the logarithm of the observed sales price (see here), so your CV
error unit might be different from the true test error unit.
6. Bonus : Preprocessing (10%)
Please include the discussion of what processing you apply.
If you don't get a chance to do the preprocessing, please use the data I provided (which
has been cleaned) in the instructions.
Add a File Record Audio

Due Date

Mar 30, 2022 1:50 PM

Submit Assignment

Files to submit

(0) file(s) to submit

After uploading, you must click Submit to complete the submission.

Comments

A quick suggestion: when you do the preprocessing, you'll likely notice that some
categorical predictors have different numbers of levels in the training and testing
datasets, so when you apply model.matrix or other methods to transform these
variables into dummy variables, the column names and number of columns are no
longer the same in the training and testing data sets, which can result in errors when
trying to make predictions. One strategy to deal with that, among others, is to combine
the two datasets first and then use "model.matrix" to deal with those categorical
predictors, and then split the data into training and test datasets.