代做homework | R语言代写 | 统计代写 - BTRY 4030 - Fall 2018 - Homework 5 Q - 学霸代写 - CS代写, 程序代写, CS作业代写, 代码代写, CS编程代写, java代写, python代写, c++/c代写, R代写, 算法作业代写, web代写, CS assignment代写, MATH代写, 统计代写, 金融代写, business代写, economic, accounting代写等

代做homework | R语言代写 | 统计代写 – , 这是一个的homework代写的题目，涉及R语言代写和统计代写

You may either respond to the questions below by editing the hw5_2018_q1.Rmd to include your answers

and compiling it into a PDF document, or by handwriting your answers and scanning these in.

You may discuss the homework problems and computing issues with other students in the class. However, you

must write up your homework solution on your own. In particular, do not share your homework RMarkdown

file with other students.

Here we will add one more deletion diagnostic to our arsenal. When comparing two possible models, we often

want to ask Does one predict future data better than the other? One way to do this is to divide your data

into two collections of observations( X
1

, y
1

)and( X
2

, y
2

), say. We use( X
1

, y
1

)to obtain a linear regression

model, with parameters

 and look at the prediction error ( y
2

T
( y
2

).

This is a bit wasteful  you could use( X
2

, y
2

)to improve your estimate of

. However, we can assess how

well this type of model does (for these data) as follows:

For each observation i

i. Remove( x i,yi )from the data and obtain


( i )

from the remaining n  1 data points.

ii. Use this to make a prediction y 
( i ) i

= x


( i )

Return the cross validation error CV =

i =

( yi  y 
( i ) i

This can be used to compare a models that use different covariates, for example; particularly when the models

are not nested. We will see an example of this in Question 2.

Here, we will find a way to calculate CV without having to manually go through removing observations one

by one.

a.We will start by considering a separate test-set. As in the midterm, imagine that we have X 2 = X 1 , but

that the errors that produce y 2 are independent of those that produce y 1. We estimate

 |using( X 1 , y 1 ):

 = ( X

1
X

y. Show that the in-sample average squared error,( y 1  X 1

T
( y 1  X 1

 ) /n , is

biased downwards as an estimate of  , but the test-set average squared error,( y 2  X 2

T
( y 2  X 2

 ) /n ,

is biassed upwards. (You may find the midterm solutions helpful.)

b.Suppose that 
p

= 0, that is the final column of X
1

has no impact on prediction. Show that the test set

error is smaller if we remove the final column from each of X
1

and X
2

than if we dont. (This makes

using a test set a reasonable means of choosing what covariates to include. )

c. Now we will turn to cross validation. Using the identity


( i )

1  h
ii

T
X )

 1
x
i

e
i

from class, to obtain an expression for the out of sample prediction x


( i )

in terms of x i , yi ,

 and hii

only.

d.Hence obtain an expression for the prediction error yi  x


( i )

using only yi , y  i and hii. You may want

to check this empirically using the first few entries of the data used in Question 2.

e. Show that the over-all CV score can be calculated from

i =

(1 h
ii

that is, without deleting observations, and only requiring the leverages hii.

文章