# lab | R语言代写 | 统计代写 – 632 HW

### 632 HW

lab | R语言代写 | 统计代写 –  这个项目是lab代写的代写题目，关于R语言 ##### 2019/4/

library (ISLR) library (tree) library (rpart.plot)

## Warning: package ‘rpart.plot’ was built under R version 3.5.

library (rpart) library (caret)

## Warning: package ‘caret’was built under R version 3.5.

library (randomForest)

## margin

###### Q1:
``````a.
``````

set.seed (1) train = sample (1 :nrow (Carseats), nrow (Carseats) / 2) training = Carseats[train, ] testing = Carseats[ train,]

``````b.
``````

reg_tree = tree (Sales ~ .,data = Carseats, subset=train) summary (reg_tree)

## -4.2570 -1.0360 0.1024 0.0000 0.9301 3.

reg_model <- rpart (Sales ~ .,data = training, method = “anova”) rpart.plot (reg_model)

``````ShelveLoc = Bad,Medium
``````
``````Price >= 121
``````
``````Age >= 67
``````
``````CompPrice < 148
``````
``````Advertising < 11
``````
``````Age >= 51
``````
``````Price >= 92
``````
``````ShelveLoc = Bad
``````
``````Price >= 105
``````
``````ShelveLoc = Bad
``````
``````Price >= 113
``````
``````US = No
``````
###### 5%
``````yes no
``````

yhat = predict (reg_tree,newdata = testing) mean ((yhat testing \$ Sales) ^ 2)

##  4.

As we can see in the plot, as the price higher than 121, the rate is 82%. In this percentage age older than 67 is 30%, age older than 51 is 53%, then for the prcie higher than 113 is 18%. And the test MSE is 4.148897.

``````c.
``````

trControl <- trainControl (method =’cv’, number = 10) crossvali_model <- train (Sales ~ ., data = training, method =’rpart’, trControl = trControl, tuneGrid = expand.grid (cp = seq (0, 0.4, length.out = 30)))

## trainInfo, : There were missing values in resampled performance measures.

plot.train (crossvali_model)

##### RMSE (CrossValidation)
###### 0.0 0.1 0.2 0.3 0.

New_tree = prune.tree (reg_tree, best = 8) rpart.plot (crossvali_model \$ finalModel)

##### 5%
``````yes no
``````

yhat= predict (New_tree, newdata= testing) mean ((yhat testing \$ Sales) ^ 2)

###### ##  5.

As we can in the result, the new tree increse the test MSE to 5.09085.

``````d.
``````

set.seed (1) bagging = randomForest (Sales ~ ., data = training, mtry = 10, importance = TRUE) yhat_bagging = predict (bagging, newdata = testing) mean ((yhat_bagging testing \$ Sales) ^ 2)

##  2.

importance (bagging)

## US 7.2258536 17.

varImpPlot (bagging)

#### bagging

As we can see in the plot, the most important variables are the Price and the quality for car seats in each site. And the test MSE with bagging approach regression is 2.614642.

``````e.
``````

set.seed (1) random_forst = randomForest (Sales ~ ., data = training, mtry = 3, importance = TRUE) yhat_random = predict (random_forst, newdata = testing) mean ((yhat_random testing \$ Sales) ^ 2)

##  3.

As we can see the test MSE is 3.237463, which is bigger than bagging test MSE. Thus the random forests cannot improve this case.

###### Q2:
``````a.
``````

set.seed (1) train = sample ( dim (OJ),800) training_OJ = OJ[train,] testing_OJ = OJ[ train,]

``````b.
``````

OJ_tree = tree (Purchase ~ ., data = training_OJ) summary (OJ_tree)

## Misclassification error rate: 0.165 = 132 / 800

As we can see the training error rate is 0.165 and there are 800 terminal nodes.

``````c.
``````

OJ_tree

## 13) ListPriceDiff > 0.235 102 69.76 CH ( 0.89216 0.10784 ) *

``````## 7) LoyalCH > 0.764572 278 86.14 CH ( 0.96403 0.03597 ) *
``````

When we find the terminal node because of the asterisk. For example, the label9 of the split criterion what is LoyalCh > 0.0356415, which means the branch is 109 of the dviance of 100.90. As less than 17.4% in the branch take the value of CH and the remaining 82.6% is take the value of MM. d. rpart_modelOJ <- rpart (Purchase ~ ., data = training_OJ, method =’class’, control = rpart.control (cp = 0)) rpart.plot (rpart_modelOJ)

``````LoyalCH >= 0.
``````
``````LoyalCH >= 0.
``````
``````ListPriceDiff >= 0.
``````
``````StoreID >= 3
``````
``````LoyalCH >= 0.
``````
``````SalePriceMM >= 1.
``````
``````ListPriceDiff >= 0.
``````
``````WeekofPurchase < 229
``````
``````STORE >= 4
``````
``````LoyalCH >= 0.
``````
``````PriceDiff >= 0.
``````
``````ListPriceDiff >= 0.
``````
``````LoyalCH < 0.
``````
``````LoyalCH >= 0.
``````
``````STORE >= 4
``````
``````PriceMM < 2.
``````
``````StoreID >= 2
``````
``````WeekofPurchase >= 247
``````
``````SpecialCH = 1
``````
``````LoyalCH >= 0.
``````
``````ListPriceDiff < 0.
``````
``````DiscMM >= 0.
``````
``````0.38CH
100%
``````
``````0.11CH
56%
``````
``````0.04CH
35%
``````
``````0.24CH
22%
``````
``````CH
0.1113%
``````
``````0.04CH
9%
``````
``````0.24CH
4%
``````
``````0.15CH
3%
``````
``````MM0.
1%
``````
``````CH
0.439%
``````
``````0.37CH
8%
``````
``````0.20CH
2%
``````
``````0.45CH
5%
``````
``````0.14CH
1%
``````
``````0.52MM
4%
``````
``````0.31CH
2%
0.65MM
2%
0.80MM
1%
``````
``````0.73MM
44%
``````
``````0.59MM
22%
``````
``````CH
0.4512%
``````
``````0.29CH
4%
``````
``````0.12CH
1%
``````
``````0.35CH
2%
``````
``````0.23CH
2%
0.57MM
1%
``````
``````0.51MM
9%
``````
``````0.25CH
1%
``````
``````0.55MM
8%
``````
``````0.48CH
6%
``````
``````CH
0.414%
``````
``````0.32CH
2%
0.53MM
2%
0.67MM
2%
``````
``````MM0.
2%
``````
``````MM
0.7610%
``````
``````0.31CH
2%
``````
``````MM0.
9%
``````
``````0.88MM
21%
``````
``````MM
0.8214%
``````
``````MM0.
4%
``````
``````0.44CH
1%
``````
``````MM0.
2%
``````
``````MM0.
10%
``````
``````MM0.
7%
``````
``````yes no
``````
``````As we can see the above plot, as the information about that a particular customer that the plot to predict
which brand will buy the orange juice.
e.
Prediction_tree = predict (rpart_modelOJ, newdata = testing_OJ, type = "class")
table (Prediction_tree, testing_OJ \$ Purchase)
``````
``````##
## Prediction_tree CH MM
## CH 132 25
## MM 27 86
(132 + 86) / 270
``````
``````##  0.
Thus the test observation are 81% are correctly and the test error rate is 19%.
f.
CV_OJ = cv.tree (OJ_tree, FUN = prune.misclass)
CV_OJ
``````
``````## \$size
``````

##  “prune” “tree.sequence”

``````g.
``````

tree_size <- CV_OJ \$ size deviance <- CV_OJ \$ dev plot (tree_size, deviance / nrow (training_OJ), type = “b”, x lab = “tree_size”, ylab = “deviance”)

##### deviance

As we can see in the plot, the lowest classification error rate is tree size 5.

``````h.
``````

pruned_tree <- rpart (Purchase ~ ., data = training_OJ, method = ‘class’, control = rpart.control (cp = 0.01)) rpart.plot (pruned_tree)

``````LoyalCH >= 0.
``````
``````LoyalCH >= 0.
``````
``````PriceDiff >= 0.
``````
``````SpecialCH = 1
``````
``````CH
0.
100%
``````
``````CH
0.
56%
``````
``````MM
0.
44%
``````
``````MM
0.
22%
``````
``````CH
0.
12%
``````
``````MM
0.
10%
``````
``````CH
0.
2%
``````
``````MM
0.
9%
``````
``````MM
0.
21%
``````
``````yes no
``````
``````i.
``````

Pred_tree = predict (pruned_tree, newdata = testing_OJ, type = “class”) table (Pred_tree, testing_OJ \$ Purchase)

(147 + 64) / 270

##  0.

Thus we can find the test error in part.d. is 0.8074074 and the part.i. is 0.7814815. Which means the pruned trees is better than the unpruned trees.