# 代做homework | R语言代写 | 统计代写 – BTRY 4030 – Fall 2018 – Homework 5 Q

### BTRY 4030 – Homework 5 Q

#### Due Tuesday, December 4, 2018

``````You may either respond to the questions below by editing the hw5_2018_q1.Rmd to include your answers
``````
``````and compiling it into a PDF document, or by handwriting your answers and scanning these in.
``````

You may discuss the homework problems and computing issues with other students in the class. However, you

``````must write up your homework solution on your own. In particular, do not share your homework RMarkdown
``````
``````file with other students.
``````
``````Here we will add one more deletion diagnostic to our arsenal. When comparing two possible models, we often
``````
``````want to ask Does one predict future data better than the other? One way to do this is to divide your data
``````
``````into two collections of observations( X
1
``````
``````, y
1
``````
``````)and( X
2
``````
``````, y
2
``````
``````), say. We use( X
1
``````
``````, y
1
``````
``````)to obtain a linear regression
``````
``````model, with parameters
``````
`````` and look at the prediction error ( y
2
``````
##### X
``````2
``````
`````` )
``````
``````T
( y
2
``````
##### X
``````2
``````
`````` ).
``````
``````This is a bit wasteful  you could use( X
2
``````
``````, y
2
``````
``````)to improve your estimate of
``````
``````. However, we can assess how
``````
``````well this type of model does (for these data) as follows:
``````
``````For each observation i
``````
``````i. Remove( x i,yi )from the data and obtain
``````
``````
( i )
``````
``````from the remaining n  1 data points.
``````
``````ii. Use this to make a prediction y
( i ) i
``````
``````= x
``````
``````T
``````
``````i
``````
``````
( i )
``````
##### .
``````Return the cross validation error CV =
``````
``````n
``````
``````i =
``````
``````( yi  y
( i ) i
``````
##### )
``````2
``````
``````This can be used to compare a models that use different covariates, for example; particularly when the models
``````
``````are not nested. We will see an example of this in Question 2.
``````
``````Here, we will find a way to calculate CV without having to manually go through removing observations one
``````
``````by one.
``````
``````a.We will start by considering a separate test-set. As in the midterm, imagine that we have X 2 = X 1 , but
``````
``````that the errors that produce y 2 are independent of those that produce y 1. We estimate
``````
`````` |using( X 1 , y 1 ):
``````
`````` = ( X
``````
``````T
``````
``````1
``````
##### X 1 )
`````` 1
X
``````
``````T
``````
``````1
``````
``````y. Show that the in-sample average squared error,( y 1  X 1
``````
`````` )
``````
``````T
( y 1  X 1
``````
`````` ) /n , is
``````
``````biased downwards as an estimate of  , but the test-set average squared error,( y 2  X 2
``````
`````` )
``````
``````T
( y 2  X 2
``````
`````` ) /n ,
``````
``````is biassed upwards. (You may find the midterm solutions helpful.)
``````
``````b.Suppose that
p
``````
``````= 0, that is the final column of X
1
``````
``````has no impact on prediction. Show that the test set
``````
``````error is smaller if we remove the final column from each of X
1
``````
``````and X
2
``````
``````than if we dont. (This makes
``````
``````using a test set a reasonable means of choosing what covariates to include. )
``````
``````c. Now we will turn to cross validation. Using the identity
``````
``````
( i )
``````
##### =
``````
``````
##### 1
``````1  h
ii
``````
##### ( X
``````T
X )
``````
`````` 1
x
i
``````
`````` e
i
``````
``````from class, to obtain an expression for the out of sample prediction x
``````
``````T
``````
``````i
``````
``````
( i )
``````
``````in terms of x i , yi ,
``````
`````` and hii
``````
``````only.
``````
``````d.Hence obtain an expression for the prediction error yi  x
``````
``````T
``````
``````i
``````
``````
( i )
``````
``````using only yi , y  i and hii. You may want
``````
``````to check this empirically using the first few entries of the data used in Question 2.
``````
##### 1

e. Show that the over-all CV score can be calculated from

``````n

``````
``````i =
``````
`````` e
``````
``````2
``````
``````i
``````
``````(1 h
ii
``````
##### )
``````2
``````
``````that is, without deleting observations, and only requiring the leverages hii.
``````