mining | assignment | STAT代写 | 统计代写 – STAT3057/6057: Assignment 2 Questions

STAT3057/6057: Assignment 2 Questions

mining | assignment | STAT代写 | 统计代写 – 本题是一个利用mining进行练习的代做, 对mining的流程进行训练解析, 是有一定代表意义的统计等代写方向, 这个项目是统计/stat代写的代写题目

ass代做 assignment代写 代写assignment

Semester 1, 2022

INSTRUCTIONS :

Total Marks: STAT3057: 75; STAT6057: 80

There are 4 questions in this assignment.

Question 4(g) should just be answered for students enrolled in STAT6057.

The due date for this assignment is: 5PM, FRIDAY 20 MAY (Week 11).

A submission link for the assignment will be placed on Wattle prior to the due date.

Please submit your assignment as a PDF document. Do not submit in other formats.

Make sure to include all working out steps in your assignment answers.

This assignment involves the use of R for parts of Question 3 and Question 4. For questions that require the use of R, you are required to submit the R code that you use. Do not submit the R code as a separate file. Instead, just include the relevant R code in your answers to the relevant questions.

DATA FOR ASSIGNMENT

You will need to use the data file Assignment2TimeSeriesData.csv that has been placed on Wattle to answer Question 4.

Run the code below. You may have to specify the correct path to the csv file corresponding with where you have saved the file:

TSdata <- read.csv ("Assignment2TimeSeriesData.csv", header = TRUE) $ value

QUESTION 1 (17 marks)

a) Annual losses are modelled by a Pareto distribution with  = 2and  = 2000. A certain insurance
plan pays losses incurred for an insured policy holder subject to the following provisions:
  • the policy holder pays 100% of any loss amounts up to a deductible (policy excess) of $250.
  • the policy holder then pays 25% of any losses between $250 and $2250.
  • the policy holder pays 100% of losses above $2250 until the policy holder has paid $3600 in total.
  • the policy holder then pays 5% of any remaining losses.
For example, if a policy holder incurs a loss of $3000, they would be required to pay:
250 + 0. 25(2250250) + (30002250) = $1500.
i) Show that for a Pareto distribution with parameters  and  ,
 e
d
xf ( x ) dx =

  1
(
d + 
(  + d ) 
e + 
(  + e ) 
)
,
where d  0 ,e > 0 (4 marks)
ii) For the insurance plan given above, calculate the expected annual amount paid by the policy
holder. (8 marks)
b) An insurance portfolio has claims, X , that are believed to follow a Weibull distribution with pdf :
fX ( x ) =
0. 3 x ^0.^7

exp
(
x^0.^3

)
.
The insurer takes out an excess-of-loss reinsurance contract with a retention level of $900. You
observe 13 claims. 4 claims are above the retention level and 9 claims are below the retention
level. The claim amounts below the retention level are:
142, 164, 233, 267, 342, 475, 590, 722, 883
Find the maximum likelihood estimate of . (5 marks)

QUESTION 2 (13 marks)

a) An analyst is assessing the risks of an equity portfolio and wishes to estimate the probability that the
portfolio will incur at least one daily loss exceeding 10% next month. They have 2 years (24 months)
of historic daily loss data.
Explain how a Generalised Extreme Value distribution and the block maxima method could be
used to estimate this probability for this specific situation. What are the limitations of using this
approach? (4 marks)
b) You are given the joint CDF:
FX,Y ( x,y ) = (1 + x ^1 / + y ^1 / ( x + y )^1 / )^1 /,
where  > 0 , > 0.
i) Calculate the marginal CDFs and the copula C ( u,v ). (6 marks)
ii) Two different types of losses have exponential distributions each with mean of $600. As-
suming that the dependency between the losses can be modelled using the copula from part
(i), calculate the joint probability that the losses are both less than $200. For this question
you can assume that  = 2. 5 and  = 4. 3. (3 marks)

QUESTION 3 (20 marks)

Note : For this question you should answer part (a), (b) and (c) by hand and not use R. You should use R to answer part (d).

You are given the following time series models:

i) xt = xt  1 + wt  0. 5 wt  1  0. 5 wt  2
ii) xt =^94 xt  1 +^94 xt  2 + wt ^83 wt  1  wt  2

where{ wt }is white noise with variance of 1.

For each of these models, answer parts (a), (b), (c) and (d) below:

a) Find the roots of the AR and MA characteristic polynomials, identify the values of p and q for which
they are ARMA(p,q) models, determine whether they are stationary, and determine whether they
are invertible. Make sure to remove redundant parameters before identifying p and q , and before
deter mining stationarity and invertibility. (5 marks)
Use the irredundant form of the models when answering the other question parts below. (The
irredundant form is the form of the models where redundant parameters have been removed.)
b) Calculate the ACF values  1 ,  2 , and the formula for h as a function of h for h > 2 , showing all
working. (7 marks)
c) Find the first four coefficients in the linear process representation of xt =
j =0 wt  j. (5 marks)
d) Using R functions for time series that we introduced during lectures, simulate and plot 200 observations
from each model, and plot the sample ACFs. (3 marks)

QUESTION 4 (25 marks) (30 marks for STAT6057)

Note : For this question you will need to use R, however, there are restrictions on using R for some of part (f) (see below for details). Part (g) should just be answered for students enrolled in STAT6057.

Use the data TSdata to answer this question. Let{ xt }denote the time series TSdata (where t = 1 ,…, 137 ), and{ yt }denote the differenced time series, i.e., yt = xt (where t = 2 ,…, 137 ).

For all of the plots that you include in your answer, make sure to appropriately label the axes.

a) Plot the time series data. Describe the key features in the data. (2 marks)

b) Plot out the SACF (the ACF of the data) and SPACF (the PACF of the data), and comment on what you observe in the correlograms. (2 marks)

c) Using first differences of the time series, plot out the SACF and SPACF of the differenced data.
Comment on what you observe in the correlograms. Based on these collelograms do you think an
AR(p) or MA(q) might be a suitable model for the differenced data? Why or why not? (2 marks)

d) Fit an appropriate ARMA model to the differenced data ( yt ). (6 marks)

For part (d) you can use correlograms and/or other model selection methods to decide which
ARMA model is appropriate. You should use the R function arima to undertake the model
fitting. You should describe the process and steps that you have taken to determine your preferred
model for yt , and include the R output for this model (Do not include model output for other
models). You should also write down the equation including coefficients for your preferred model
for yt. NOTE: The arima function in R outputs a parameter called the intercept in R. This
is an estimate of the mean  , and not an estimate of the constant  0.
e) Carry out diagnostic tests. Comment on the results of each test, and explain if these support the choice
of model that you selected in part (d). (5 marks)
f) Fit an ARMA(1,3) model to yt using the arima function in R. Based on the coefficient estimates
for this fit, calculate predicted values of yt and xt for the next 4 time periods. For part (f), do not
use R functions such as predict or forecast to undertake the predictions. You should undertake the
predictions manually and show all workings for the calculations. The most recent observation is the last
observation x 137 , and the most recent differenced observation is y 137 = x 137  x 136. (The differenced
series starts at y 2 .)
Include the following in your answer to part (f):
(i) Write down equations for the predicted values of the differenced series: y  137 (1), y  137 (2),
y  137 (3), and y  137 (4), and then calculate these predicted values. Hint: You can use the
output of the ARMA(1,3) fit to find the white noise terms w 137 ,w 136 ,... which are the
residuals of the model fit. (5 marks)
(ii) Write down equations for the predicted values of the original series: x  137 (1), x  137 (2), x  137 (3),
and x  137 (4), and then calculate these values using your results from part (i). (3 marks)

g) STAT6057 only. Write code in R to calculate the white noise terms w 137 ,w 136 ,…,w 1 directly, rather than using the residuals from the model fit. State any assumptions that you make to calculate these values. Show that these values are equal to the residuals from the model fit for recent values of t (e.g., for t > 100 ). (5 marks)