econ代写 | 经济代写 | 商科代写 | 代做project | assignment – ECON 178 WI21

ECON 178 WI21:

econ代写 | 经济代写 | 商科代写 | 代做project | assignment – 这个题目属于一个经济方面的代写任务, 涵盖了经济方面的内容, 这是值得参考的assignment代写的题目

project代写 代写project

Final project Guidelines

Instructor: Ying Zhu

Ying Zhu 2022

Overview of the data

####### The data is from the 1991 Survey of Income and Program Participation

####### (SIPP). You are provided with 7933 observations.

####### The sample contains households data in which the reference persons

####### aged 25-64 years old. At least one person is employed, and no one is

####### self-employed. The observation units correspond to the household

####### reference persons.

####### The data set contains a number of feature variables that you can

####### choose to predict total wealth. The outcome variable (total wealth) and

####### feature variables are described in the next slide.

Dataframe with the following variables

Variable to predict (outcome variable):

  • tw: total wealth (in US $).
    • Total wealth equals net financial assets, including Individual Retirement Account (IRA) and 401(k) assets, plus housing equity plus the value of business, property, and motor vehicles. Variables related to retirement (features):
  • ira: individual retirement account (IRA) (in US $).
  • e401: 1 if eligible for 401(k), 0 otherwise Financial variables (features):
  • nifa: non-401k financial assets (in US $).
  • inc: income (in US $).
Variables related to home ownership (features):
  • hmort: home mortgage (in US $).
  • hval: home value (in US $).
  • hequity: home value minus home mortgage. Other covariates (features):
  • educ: education (in years).
  • male: 1 if male, 0 otherwise.
  • twoearn: 1 if two earners in the household, 0 otherwise.
  • nohs, hs, smcol, col: dummies for education: no high- school, high-school, some college, college.
  • age: age.
  • fsize: family size.
  • marr: 1 if married, 0 otherwise.

What is 401k and IRA?

  • Both 401k and IRA are tax deferred savings options which aims to increase individual saving for retirement
  • The 401(k) plan:
    • a company-sponsored retirement account where employees can contribute
    • employers can match a certain % of an employees contribution
    • 401(k) plans are offered by employers — only employees in companies offering such plans can participate – The feature variable e401 contains information on the eligibility
  • IRA accounts:
    • Individuals can participate
      • No employer matching
      • The feature variable ira contains IRA account (in US $)

Reference: https://www.investopedia.com/ask/answers/12/401k.asp

Your tasks

######## Build a prediction/fitted model to predict total wealth (tw) in US dollars

######## Write up a paper, up to 20 pages (not including the code), 11 size font, and 1.5 spacing

Introduction
 Briefly state the objectives of the study
Statistical analyses
 Describe how you apply the tools you have learned from this course to perform the prediction task
 You should try different methods and compare their prediction performance and interpretability
Conclusions
 Summarize what you have discovered from this project
 (Optional) Discuss caveats to the conclusions drawn from your analyses

######## Bonus points

######## o We kept 20% of the sample on which we are going to run your proposed model and method.

We will rank the students by accuracy of the prediction on that 20% of the sample.

######## The project is due on March 17 (by 12:30pm PST). Please submit your paper and code

according to the instructions. Late  assignment will NOT be accepted except with my
prior consent regarding unusual circumstances permitted by University policies
(proper documentations will be needed)

Grading policy

  • First, please follow the policy on academic integrity stated in the syllabus:
    • You are not allowed to work together with others on the final project and the bonus opportunity; you are not allowed to get any help (including but not limited to program code) from others on the final project and the bonus opportunity.
    • We will use tools to catch any form of plagiarism and cheating. Penalties on cheating include, among others, a failing grade for the course. In addition, the Council of Deans of Student Affairs will impose a disciplinary penalty.
    • Every student in ECON178 must read, understand, agree and sign the integrity pledge (https://academicintegrity.ucsd.edu/forms/form-pledge.html) before completing any assignment for ECON178. After you sign the pledge form, a receipt will be emailed to you. Please include this receipt in the submission of your assignment.
  • Second, the maximum points (without the bonus points) you can get for the project is 40 points. Your project grade counts 55% of your course grade. Slide 7 provides a break down of the points and how your project is graded.
  • Third, there are a maximum of 40 bonus points awarded on the base of how good your out-of-sample prediction is. The best prediction receives 40 points. The second best prediction receives less than 40 points, and so on. The bonus points you earn count 5% of your course grade.
  • Fourth, the bonus points can only benefit your final grade. We will curve the grades without the bonus points first. Say if you are in the A bracket, you will stay in the A b racket even if you get zero bonus points. On the other hand, if you are in the A- bracket but you get enough bonus points to move your final grade to the A bracket, then you will get an A in the end.
  • Fifth, it is entirely possible that you get the maximum points on the project but zero bonus points. After all, luck may be needed to get a high enough accuracy on the out-of-sample prediction. But as explained above, you will never be penalized for not having luck. Having said this, we still expect harder work is more likely to lead to higher bonus points. So, you should put in your best effort.

Grading

0 - 10 points 10 - 30 points 30 - 40 points
Analysis (50% of total points) analysis is overly simplistic or
inappropriate; little or no
justification for choices of
analyses is provided
analysis is appropriate; some
justification for choices of
analyses is provided
analysis is appropriate and
informative; detailed justification
for choices of analyses is
provided
Results (25% of total points) Conclusions are missing,
incorrect, or not drawn from
analysis; plots or tables are
inappropriate
Conclusions are sensible and
drawn from analysis; plots or
tables are appropriate
Conclusions are not only drawn
from analysis but also insightful;
plots or tables are nicely
presented and facilitate
conveying the information
Code (15% of total points) Code doesn't run; or codes
outputs do not match the results
described in the paper
Code runs and codes outputs
mostly match the results
described in the paper
Code runs and codes outputs
match the results described in
the paper; codes are neat and
easy to read; no irrelevant code
Paper writing (10% of total
points)
Writing is poor, illogical, or
incoherent
Writing is mostly logical and
coherent
Writing is crystal clear, logical,
and coherent

Note: The TAs will give a couple examples in your discussion section on what we mean by giving justification for choices of analyses

How to carry out this project?

  • Data can be found on Canvas
    • Download the data and save it in your working directory
    • To load the data into R, use the code: data_tr<-read.table("data_tr.txt", header = TRUE, sep= "\t", dec = ".")[,-1]
  • Inspecting your data and preliminary analyses
    • Dependent variable ( Y ): tw: total wealth (in US $)
    • Predictors ( X ): your choice (but please make sensible choices)
    • Some suggestions: use scatter plots and/or simple linear regressions with OLS to visualize basic relationships between total wealth and various predictors
  • In-depth analyses
    • What could be the X variables in your prediction exercise?
    • What methods should you use? (OLS, Ridge, Stepwise selections, Lasso)
    • How do you select the best prediction/fitted model (K-fold cross validation, Leave- one-out)
What could be the X variables in your prediction exercise?
The plain predictors listed on Slide 3
 Watch out for perfect collinearity: You do not want to include predictors that are perfect collinear.
 For example, you dont want to include hmort (home mortgage), hval (home value), and hequity (home
value minus home mortgage) all three at the same time because hequity= hval-hmort. One solution to
this drop hequityfrom your models
 As another example of perfect collinearity, say you include the intercept term (a column of 1s) and all
four dummy variables nohs, hs, smcol, col (no high-school, high-school, some college, college), note that
nohs+hs+smcol+col= columns of 1 (the intercept). One solution to this --drop one of the education
dummies from your models
Transformations of the plain predictors listed on Slide 3: use what you have learned from Topic 6:
Flexible Linear Models
 Polynomial transformation
 The spline basis representation
 Transformation using binary indicators
 Generalized additive models (GAM)
 Interacting dummy variables with other variables; for example, age x twoearn

Before transforming the plain predictors, scatter plots may help you to visualize how each predictor is associated with the total wealth. For example, you may see a nonlinear relationship so you might want to consider some type of polynomial transformation or the spline basis representation

Collection of methods

We have already seen:

  • OLS
  • Ridge regressions
  • Stepwise selection methods
  • Lasso

Note:

1. In the project, you should select different methods from the list above and **compare their prediction performance and interpretability

  1. For Ridge, Stepwise selection, and Lasso, dont forget the use of Cross-** **Validation
  2. In addition to prediction performance, you might want to think about** whether the set of predictors used to predict total wealth make intuitive sense
Compare the prediction performances of different
methods (an example)
  • Partition the ENTIRE data into a training set and test set
  • Say, you have applied the Ridge regression procedure and the Lasso

######## procedure

  • For Ridge, you use the K-fold CV (Slide 12) to choose the best (call it ).
  • For Lasso, you also use the K-fold CV (Slide 12) to choose the best (call it ).
  • doesnt necessarily equal to
  • Which method do you choose? Ridge or Lasso?
  • You use Ridge with and Lasso with , respectively, to predict the outcomes with the predictors in the test set , and compute the (also called MSPE)
  • If MSEte is substantially larger than MSEte , choose Lasso; otherwise, choose Ridge
  • If MSEte and MSEte are similar, choose one that you feel the resulting fitted model is easier to understand (e.g., one that with fewer predictors and the predictors are intuitive)

K-fold cross validation

  1. Partition the training data into separate sets of equal size
    • = ( 1 , 2 ,…,); e.g., K = 5 10
  2. For a given and each = 1 , 2 ,…,, estimate the model with all data excluding
    • Denote the obtained model by ,()
  3. Predict the outcomes for with the model from Step 2 and the input data in
    • The predicted outcomes are , where
  4. Compute the sample mean squared (prediction) error for , known as the CV prediction error: – = ^1 , ,
2
  1. Compute the average of over all sets for each
    • av = ^1 = 1
  2. Select = that gives the smallest av

You can use the code from the discussion

sections…

Lastly,

  • Please do not leave the project to the last minute. Start early
  • Both TAs and I will be happy to answer your questions about the

####### project

Copyright

  • My pre-recorded video lectures are protected by U.S. copyright law and by University policy. I am the exclusive owner of the copyright in those materials I create.
  • You may take notes and make copies of course materials for your own use. You may also share those materials with another student who is enrolled in or auditing this course.
  • You may not reproduce, distribute or display (post/upload) lecture notes or recordings or course materials in any other way whether or not a fee is charged without my express prior written consent. You also may not allow others to do so. If you do so, you may be subject to student conduct proceedings under the UC San Diego Student Code of Conduct.
  • Similarly, you own the copyright in your original papers. If I am interested in posting your answers or papers on the course web site, I will ask for your written permission.