report | 作业math | 代做unity | assignment代写 – assignment 2

assignment 2

report | 作业math | 代做unity | assignment代写 – 这道题目是math代写任务, 涵盖了report/math等程序代做方面

Generalized Linear Models math 523 assignment 2

``````Q1 Lecture 9a
``````
``````Consider a binomial GLM with an arbitrary link functiongandnresponses that have
been entered in a grouped format. Using the same notation as in the lecture notes,
show that:
``````
``````(1) The maximum likelihood estimates ofdo not depend on whether the data have
been entered in a grouped or ungrouped format.
(2) The Fisher information matrix does not depend on whether the data have been
entered in a grouped or ungrouped format. Conclude that the asymptotic covari-
ance matrix of(and consequently the standard errors ofj,j= 1,...,p) does
not depend on the data entry format. Hint: It is easiest if you verify the entry
at position(j,k)of the Fisher information for arbitraryj,k, rather than doing
the matrix multiplication.
``````
``````Q2 Suppose thatmiYiis binomial(mi,i), whereg(i) =Xiandi= 1,...,n. Consider
the null model, for which 1 =...=n. Show that
``````
``````=
``````
``````n
i=1miyi
n
i=1mi
``````

.

``````Whenmi= 1for alli { 1 ,...,n}, show that in this case, the PearsonX^2 statistic,
which is defined as the sum of the squared Pearson residuals, equalsn. Decide whether
or notX^2 is useful for testing whether a Binomial GLM model fits the data well when
the response is binary.
``````
``````Q3 R exercise
Consider the following data on home-well contamination in 3020 households in Ara-
hazar upazila, Bangladesh. The response variable isswitch(binary variable whether
or not the household switched to another well from an unsafe well). Other variables
collected for each household werearsenic(the level of arsenic contamination in the
households original well, in hundreds of micrograms per liter),dist100(distance in
100-meter units to the closest known safe well),educ(years of education of the head of
the household) andassoc(whether or not any members of the household participated
in any comm unity organizations: no or yes). The data is available inMyCoursesunder
Datasets. Load the data and computedist100as follows.
``````
``````wells <- read.table("../Datasets/wells.dat")
attach(wells)
dist100 <- dist/
``````

Generalized Linear Models MATH 523 Assignment 2 due on March 25 at noon.

``````(1)  report whether the data have been entered in a grouped or ungrouped form, and
which explanatory variables are continuous and which are factors.
(2) Fit a logistic regression model with the intercept andarsenic. Assess the fit
of this model graphically as follows: dividearsenicinto 30 approximately filled
categories, group the data accordingly, and display the empirical logits of switch-
ing to a safe well for each category and display the fitted regression line. Do you
think the model is adequate? Perform an approximate goodness-of-fit test of the
model using the above binning and PearsonsX^2 statistic; conclude at the 5%
level.
(3) Find the most appropriate logistic regression model for the data. Use the de-
viance, but also consider practical significance by looking at the AIC and the size
of the effect of the predictors.
(4) Try to simplify the model you found in part (3) by replacingeducby a binary
factor predictorfeduc, constructed as follows:
``````
``````feduc <- numeric(3020)
for(i in 1:3020){
if(educ[i] < 9){feduc[i] <- 0}
if(educ[i] > 8){feduc[i] <- 1}
}
``````
``````This predictorfeducrecords whether the person has a primary education (i.e.
18 years) or secondary education and above (i.e. more than 9 years).
(5) Compare the final model in parts (3) and (4) using AIC and ROC curves. Which
one do you prefer and why? Interpret the final model you selected.
``````
``````Q4 R exercise
Consider a study on the duration of unemployment (1: short-term unemployment,
less than 6 months; 0: long-term unemployment) with explanatory variables gender
(1: male, 0: female) and level of education (0: lower, 1: higher). The data are
summarized in the table below.
``````
``````Gender Education Level Short Term Unemployment Long Term Unemployment
1 0 313 126
1 90 41
0 0 196 132
1 42 43
``````
``````(1) Analyze these contingency table data with logistic regression using the duration
of unemployment as a response.
``````

Generalized Linear Models MATH 523 Assignment 2 due on March 25 at noon.

``````(2) Describe the dependence relationship between the explanatory variables and the
response (conditional independence, homogeneous association etc.) in the model
selected in part (1).
(3) For the model selected in part (1), calculate the relevant odds rat ios that de-
scribe the effect of the explanatory variable(s) on the response along with a 95%
confidence interval.
(4) Calculate the expected counts from the model selected in part (1) and compare
them to the observed counts using PearsonsX^2 statistic. Test goodness of fit
using an appropriate^2 null distribution and conclude at the 5% level.
(5) Interpret the final model in one or two sentences in layman terms that a non-
statistician can understand (no formulas).
``````
``````Due on March 25 at noon.
``````