Contingency Table Analysis
代做arm | ios – 这是一个arm面向对象设计的practice, 考察arm的理解, 是比较有代表性的arm/ios等代写方向
Johanna G. Neslehova
09/03/
The effectiveness of AZT for HIV patients
In the following study, 338 HIV positive patients were randomly assigned to either receiveaztimmediately or only after their T cells showed severe immune weakness. The response is whether the patient developed AIDS during the 3-year study or not. Theraceof the patient (A for African American and W for white American) was recorded as well, so that possible genetic factors can be detected.
aids <- c (14,32,11,12) naids <- c (93,81,52,43) azt <- factor ( c ("Y","N","Y","N"),levels= c ("Y","N")) race <- factor ( c ("W","W","A","A"),levels= c ("W","A"))
Notice that the data are entered in the grouped form and thataztandraceare both factor predictors with 2 levels. The data can thus be summarized in a 2 2 2 contingency table.
ftable ( xtabs ( cbind (aids,naids) ~ azt + race))
aids naids
azt race
Y W 14 93
A 11 52
N W 32 81
A 12 43
There are altogether 5 logistic models we can fit to these data. Fit them using the functionglm, and store them.
m0 <- glm ( cbind (aids,naids) ~ 1,family=binomial) m1 <- glm ( cbind (aids,naids) ~ azt,family=binomial) m2 <- glm ( cbind (aids,naids) ~ race,family=binomial) m3 <- glm ( cbind (aids,naids) ~ azt + race,family=binomial) m4 <- glm ( cbind (aids,naids) ~ azt ***** race,family=binomial)
Now explore each model in turn. Start with the model 1 , containing only the intercept. Calculate the fitted probabilities and the fitted odds ratios.
prob <- predict (m0,type="response") names (prob) <- c ("YW","NW","YA", "NA") prob
YW NW YA NA
0.204142 0.204142 0.204142 0.
prob / (1 – prob)
YW NW YA NA
0.2565056 0.2565056 0.2565056 0.
as.numeric ((prob[2] ***** (1 – prob[1])) / (prob[1] ***** (1 – prob[2])))
[1] 1
as.numeric ((prob[4] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[4])))
[1] 1
as.numeric ((prob[1] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[1])))
[1] 1
as.numeric ((prob[2] ***** (1 – prob[4])) / (prob[4] ***** (1 – prob[2])))
[1] 1
Summarize your observations here. In this model, all probabilities of developing HIV are the same, all odds rat ios are 1 , the response is independent of (azt,race)
Next, move to model1+aztand do the same thing again.
prob <- predict (m1,type="response") names (prob) <- c ("YW","NW","YA", "NA") prob
YW NW YA NA
0.1470588 0.2619048 0.1470588 0.
prob / (1 – prob)
YW NW YA NA
0.1724138 0.3548387 0.1724138 0.
as.numeric ((prob[2] ***** (1 – prob[1])) / (prob[1] ***** (1 – prob[2])))
[1] 2.
as.numeric ((prob[4] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[4])))
[1] 2.
as.numeric ((prob[1] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[1])))
[1] 1
as.numeric ((prob[2] ***** (1 – prob[4])) / (prob[4] ***** (1 – prob[2])))
[1] 1
Summarize your observations here. The probability of developing HIV depends only onazt. The odds ratio of developing HIV whenaztis given compared to when it is not does not depend onrace. The odds ratio of developing HIV for African Americans vs. white Americans is one at each level atazt, The response is independent ofracegivenazt.
Then move on to the model 1+race. Again, calculate the fitted probabilities and odds ratios.
prob <- predict (m2,type="response") names (prob) <- c ("YW","NW","YA", "NA") prob
YW NW YA NA
0.2090909 0.2090909 0.1949153 0.
prob / (1 – prob)
YW NW YA NA
0.2643678 0.2643678 0.2421053 0.
as.numeric ((prob[2] ***** (1 – prob[1])) / (prob[1] ***** (1 – prob[2])))
[1] 1
as.numeric ((prob[4] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[4])))
[1] 1
as.numeric ((prob[1] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[1])))
[1] 1.
as.numeric ((prob[2] ***** (1 – prob[4])) / (prob[4] ***** (1 – prob[2])))
[1] 1.
Summarize your findings here. The response is independent ofaztgivenrace
Then move on to the model1+azt+race. Again, calculate the fitted probabilities and odds ratios.
prob <- predict (m3,type="response") names (prob) <- c ("YW","NW","YA", "NA") prob
YW NW YA NA
0.1496245 0.2653998 0.1427012 0.
prob / (1 – prob)
YW NW YA NA
0.1759511 0.3612847 0.1664545 0.
as.numeric ((prob[2] ***** (1 – prob[1])) / (prob[1] ***** (1 – prob[2])))
[1] 2.
as.numeric ((prob[4] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[4])))
[1] 2.
as.numeric ((prob[1] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[1])))
[1] 1.
as.numeric ((prob[2] ***** (1 – prob[4])) / (prob[4] ***** (1 – prob[2])))
[1] 1.
Summarize your findings here. the probability of developing HIV depends on both } andrace. the odds ratio of developing HIV whenaztis given compared to when it is not does not depend onrace, The odds ratio of develpoing HIV for African vs white Americans does not depend on whetheraztis given or not since this odds ratio is close to 1 here. This model describes homogeneous association.
Finally, explore the model1+azt+race+azt:race. Calculate the fitted probabilities, fitted odds ratios as well as the observed probabilities and odds ratios.
prob <- predict (m4,type="response") names (prob) <- c ("YW","NW","YA", "NA") prob
YW NW YA NA
0.1308411 0.2831858 0.1746032 0.
prob / (1 – prob)
YW NW YA NA
0.1505376 0.3950617 0.2115385 0.
as.numeric ((prob[2] ***** (1 – prob[1])) / (prob[1] ***** (1 – prob[2])))
[1] 2.
as.numeric ((prob[4] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[4])))
[1] 1.
as.numeric ((prob[1] ***** (1 – prob[3])) / (prob[3] ***** (1 – prob[1])))
[1] 0.
as.numeric ((prob[2] ***** (1 – prob[4])) / (prob[4] ***** (1 – prob[2])))
[1] 1.
Summarize your findings here. What do you observe about the fitted and observed quantities? Why do you think this is the case?m4is saturated. Because it contains the interaction betweenaztandrace, the odds ratios now depend on the level of the variable that is being held fixed. The fitted and observed counts are the same inm4because the model is saturated.
Finally, use tests of deviance to find the most appropriate model for these data. Dont forget to do test whether the model you selected is adequate for the data in the first place (and be sure to understand why you can use a test based on deviance to do this).
anova (m4,m3,test="Chisq")
Analysis of Deviance Table
Model 1: cbind(aids, naids) ~ azt * race
Model 2: cbind(aids, naids) ~ azt + race
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 0 0.
2 1 1.3835 -1 -1.3835 0.
anova (m3,m1,test="Chisq")
Analysis of Deviance Table
Model 1: cbind(aids, naids) ~ azt + race
Model 2: cbind(aids, naids) ~ azt
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 1 1.
2 2 1.4206 -1 -0.037084 0.
anova (m1,m0,test="Chisq")
Analysis of Deviance Table
Model 1: cbind(aids, naids) ~ azt
Model 2: cbind(aids, naids) ~ 1
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 2 1.
2 3 8.3499 -1 -6.9293 0.008479 **
—
Signif. codes: 0′‘0.001 ‘‘ 0.01’‘ 0.05′.’ 0.1” 1
summary (m1)
Call:
glm(formula = cbind(aids, naids) ~ azt, family = binomial)
Deviance Residuals:
1 2 3 4
-0.4813 0.5102 0.6026 -0.
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7579 0.2166 -8.117 4.77e-16 ***
aztN 0.7218 0.2787 2.590 0.00961 **
—
Signif. codes: 0′‘0.001 ‘‘ 0.01’‘ 0.05′.’ 0.1” 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8.3499 on 3 degrees of freedom
Residual deviance: 1.4206 on 2 degrees of freedom
AIC: 22.
Number of Fisher Scoring iterations: 4
pchisq ( deviance (m1),df=2, lower.tail = FALSE)
[1] 0.
Using the above insights, interpret the final model in one simple sentence in terms of independence/association relations between the predictors and the response. The modelm1means that race is independent of aids given azt, but there is association between aids and azt.
The modelm1with onlyaztas the main effect seems the most appropriate.Because the data are in a grouped form, we can use the deviance to test whether it is adequate. The p value is so high so there is no lack of fit of modelm1. ## Retrospective study about lung cancer
The second study is one of the first studies to link lung cancer to smoking. Richard Doll and Austin Bradford Hill collected data from 20 hospitals in London, England. It is a retrospective study, in which 709 cases and 709 controls were selected. From the cases, 688 reported being a smoker (having smoked at least one cigarette per day for at least one year), while from among the controls, 650 reported being a smoker.
To investigate the effect of smoking, we can proceed to fit the model with smoking as a predictor (see the class notes for why it makes sense, even though smoking is random while the response is not).
m.lc <- glm ( cbind ( c (688,21), c (650,59)) ~ factor ( c ("Yes","No")),family = binomial)
By looking at the summary, we see that smoking is highly significant.
summary (m.lc)
##
## Call:
## glm(formula = cbind(c(688, 21), c(650, 59)) ~ factor(c("Yes",
## "No")), family = binomial)
##
## Deviance Residuals:
## [1] 0 0
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0330 0.2541 -4.065 4.80e-05 ***
## factor(c("Yes", "No"))Yes 1.0898 0.2599 4.193 2.75e-05 ***
## ---
## Signif. codes: 0'***'0.001 '**' 0.01'*' 0.05'.' 0.1'' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.9878e+01 on 1 degrees of freedom
## Residual deviance: 2.6201e-14 on 0 degrees of freedom
## AIC: 16.
##
## Number of Fisher Scoring iterations: 3
What do the quantities below represent? They are the probabilities of a person developing lung cancer given he/she is a smoker(1)/non- smoker(2) and given he/she has been sampled the study. predict (m.lc,type="response")
## 1 2
## 0.5142003 0.
Now estimate the odds ratio. The odds of developing lung cancer for a smoker is higher than the odds of
developing lung cancer for a non-smoker, by a factor of
beta <- coef (m.lc)
exp (beta[2])
## factor(c("Yes", "No"))Yes
## 2.
Now calculate the 95% confidence interval for the odds ratio:
library (arm)
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: lme
##
## arm (Version 1.12-2, built: 2021-10-15)
## Working directory is /Users/zhimingzhang/Desktop
se <- se.coef (m.lc)
se
(Intercept) factor(c("Yes", "No"))Yes
0.2541027 0.
c ( exp (beta[2] -qnorm (0.975) ***** se[2]), exp (beta[2] +qnorm (0.975) ***** se[2]))
factor(c("Yes", "No"))Yes factor(c("Yes", "No"))Yes
1.786737 4.
Based on this data, is there evidence that smoking significantly increases the risk of developing lung cancer?