R代写|数据分析|数据挖掘|统计学代写-Econometric Analysis

R代写|数据分析|数据挖掘|统计学代写:这是一个通过R语言对数据进行处理,并进行相关的数据清洗和分析的代写任务

ECON 322: Econometric Analysis 1

The goal of this project is to test whether wearing seat belts has an impact on the number of trafficfatalities. The data file seatbelts.rda for this project is in the Final Project folder of Learn. It wasused by A. Cohen and L. Einav in 2003 in their research paper The Effects of Mandatory Seat BeltLaws on Driving Behavior and Traffic Fatalities.The data set is a panel of 51 states (actually, 50 states plus the District of Columbia), runningfrom 1983 to 1997. The variables are:

  • year: indicating year.
  • state: indicating US state (abbreviation).
  • fatalities: number of fatalities per million of traffic miles.
  • seatbelt: seat belt usage rate, as self-reported by state population surveyed.
  • speed65: Is there a 65 mile per hour speed limit?
  • speed70: Is there a 70 (or higher) mile per hour speed limit?
  • drinkage: Is there a minimum drinking age of 21 years?
  • alcohol: Is there a maximum of 0.08 blood alcohol content?
  • income: median per capita income (in current US dollar).
  • age: mean age.
  • enforce: indicating seat belt law enforcement (no, primary, secondary). The definitionof the different enforcement levels is given on the Governors Highway Safety website. Basically,primary enforcement means that officers can issue a ticket for not wearing a seat belt even ifthere is no other traffic infraction. For secondary enforcement, there must be another trafficinfraction before officers can issue a ticket for not wearing a seat belt.

Part I

In the first part of the project, we want to estimate a model year by year. It is not the best way whenwe have a panel because it is more efficient to use all the data in a single model, but since we have notcovered how to estimate panel data models, it is a good way to start.As you can see, the proportion of states that adopted a seat belt law went from 0% in 1983 to100% in 1997. Not all States, however, chose the same level of enforcement.

law <- matrix(data$enforce, nrow=15) ## create a matrix num-years x num-statesno <- rowSums(law=="no")prim <- rowSums(law=="primary")sec <- rowSums(law=="secondary")ylim <- range(c(no, sec, prim))plot(1983:1997,no, xlab="year", ylab="states", type="l", col=1,ylim=ylim,main="number of states with seat belt laws")lines(1983:1997,sec,type="l",col=2)lines(1983:1997,prim,type="l",col=3)legend("topright", c("no", "secondary", "primary"), col=1:3, lty=1)

1984 1988 1992 1996

0

10

30

50

number of states with seat belt laws

year

states

no

secondary

primary

It may therefore be difficult to base the model selection on the first few years because the numberof states with a seat belt law was too small. Therefore, use the year 1987 only to select your model.We want to compare the effect of the law on the number of fatalities. For now, just distinguish Stateswith and without seat belt law. For that, create a dummy variable equals to 1 if there is a seat beltlaw and 0 otherwise:

data$law <- as.numeric(data$enforce != "no")

The dependent variable isf atalitiesand we want to measure the effect oflawon it by controllingfor the right variables and by using the appropriate functional form. The selection process shouldinclude the following (no necessarily in that order).

  • Discussion on which variable should be included and why.
  • Discussion on how each variable should enter the model (in log, with interactions, squared, etc.).It may not be obvious for all variables, but try your best.
  • Estimate the model (or models if you have more than one in mind)
  • Test for correct specification (Chapter 9), homoscedasticity (Chapter 8).
  • Any other things to look for before going to the interpretation part?
  • Interpret the result and discuss the possible weakness of the model.

Once your model is selected, estimate the effect of the law on fatalities, for all years. To presentthe results, produce one graph on which the estimated effect oflawand its confidence interval arepresented in a time series format. If your model has interactions betweenlawand other variables, youneed to compute the average partial effect oflawfor each year and its confidence interval. Discuss theresults.Hint: Here is an example of how to do it for the simplest possible model:

form <- fatalities~lawres <-vector()for (y in 1983:1997){reg <- lm(form, subset(data,year==y))conf <- confint(reg, 2)ans <- c(conf[1], coef(reg)[2], conf[2])res <- rbind(res, ans)}matplot(1983:1997, res, lty=c(2,1,2), col=c(2,1,2), lwd=2, type="l",xlab="year", ylab=expression(beta[1]),main="Effect of Seat Belt law on traffic\n fatalities")abline(h=0)

1984 1988 1992 1996

0.

0.

Effect of Seat Belt law on traffic

fatalities

year

1

Part II

We have not learned how to estimate models with panel data, but we will ignore it in this part anddo as if it was cross-sectional data. You should have realized in Part I that the sample size may betoo small to identify the effect of the law on fatalities (it is not too late to add that to your previousdiscussion). One benefit from using panel data is the sample size. Since we have 51 states and 15years, the sample size is equal to 765 when all years are used. There are, however, issues to take intoconsideration.The main problem with panel data is that the year and state dimension may hide unobservedheterogeneity that are relevant to the analysis. If we do not control for these unobserved characteristics,we may obtain biased estimators. We can control for unobserved year and state heterogeneity bycontrolling for year and state indicators (or dummies). In R, it simply means that we have to addyearandstatein the regression. Dummy variables for years and states will automatically be created.Adding such dummy variables is called adding year and state fixed effects to the model. Notice thata model that incorporates these fixed effects will have 64 more coefficients (can you guess why it isnot 66?). However, we are not interested by their values, so we do not print them in the final report.We only print the important coefficients, and add a comment that says that year and/or state fixedeffects are included. Another issue with panel data is the computation of the coefficient standard errors(covered in Chapter 8) and testing. It is very likely that you will need to compute robust standarderrors and perform robust tests. Use the same model you selected in Part I, and estimate it using allobservations. Compare the effect of the seat belt law on the number of traffic fatalities when (i) no yearnor state fixed effects are included, (ii) only year fixed effect is included, and (iii) both year and statefixed effects are included. You can test if the model is correctly specified and test for heteroscedasticityonce more, as the conclusion may differ when all years are used, but do not change your model (forsimplicity, but if you want to try other things, go ahead, it is your project). Present and interpret theresults. Which of the three models do you trust the most and why? Conclude with a discussion onthe main finding of your study. Do you think it is a valid result? Can you think of a way to improve

your model?Hint: Here is how you print your results without the year and state fixed effect, using stargazer:

res <- lm(fatalities~law+year+state, data=data)stargazer(res, type="text", omit=c("year","state"), digits=5)

===============================================

Dependent variable:

—————————

fatalities

———————————————–

law -0.00059*

(0.00031)

Constant 0.03181***

(0.00066)

———————————————–

Observations 765

R2 0.

Adjusted R2 0.

Residual Std. Error 0.00225 (df = 699)

F Statistic 77.35217*** (df = 65; 699)

===============================================

Note: *p<0.1; **p<0.05; ***p<0.

Part III

The main question is whether wearing a seat belt makes drivers feel safer and, as a result, be morecareless. Just using a law dummy variable could therefore be the wrong variable to use. For this part,consider the model of Part II (same controls and same functional form) with the following variations:

  • Instead oflaw, use the variable enf orce(R with create a dummy for primary and one forsecondary enforcement). Do you see a difference between the two levels of enforcement? Explain.
  • Drivers can always choose not to wear a seat belt even if it is required. Therefore, use the variableseatbeltinstead oflaw. Interpret the new result (the coefficient ofseatbelt). Also, using theeffects package, show on a graph how the seat belt usage affects fatalities (you can also do anyother analysis if we want). Interpret.

发表评论

电子邮件地址不会被公开。 必填项已用*标注