ISE-529 Predictive Analytics
homework | ISE代写 | 商科 – 该题目是值得借鉴的ISE代写的题目
Mid-Term Examination July 25, 2022
Instructions
- You are to complete the exam by typing your answers into this PowerPoint as indicated.
- You will have 90 minutes to complete the exam and submit it to GradeScope (in the same manner as done for homework assignments). Late submissions will be penalized.
- The exam is open-book / open-notes. You may consult any resource except another person.
- Good luck!
For this problem we will be working with the following dataset:
Linear Model Analysis
First, we create three models using X1, X2, and the combination of X1 & X2 to predict Y:
1A) For the two simple (single-predictor) models, are the predictors X & X2 significant?
- XXX
1B) For the multiple regression model, which predictors are significant?
- XXX
1C) How do you interpret what is going on here?
- XXX
Now we incorporate the categorical variable into the model by creating
a dummy variable Blue and incorporate it into the model as shown:
1D) Does adding this categorical variable to the model improve its overall performance? Why or why not?
- XXX
1E) Looking at this color-coded scatterplot of X1 vs Y, do you see any indication of an interaction effect between X1 and X3? Why or why
not?
- XXX
1E) Looking at these model results, do you see any indication of an interaction effect between X1 and X3? Why or why not?
- XXX
After completing your modeling analysis, you decide to use the model
shown below:
1F) Write out the algebraic expression for this model (you do not need
to include the error term):
- xxx
1G) Write out the simplified algebraic expression for this model for the
Blue observations
- xxx
1H) Write out the simplified algebraic expression for this model for the
Red observations
- xxx
- We have developed a model to predict the sales (in thousands of dollars) at a new store our company may decide to open in a new city and we define and fit a model with five predictors:
- : Population of the city (in thousands of people)
- : Average income of the city (in thousands of dollars per adult)
- : Type of store (1 for downtown store, 0 for a mall store)
- : Interaction between population and average income (in thousands)
- : Interaction between average income (in thousands) and store type
In the cities we are evaluating, the average income is generally less than $100,000 and the cities are in
the size range of 0 500,000 people
After fitting this model using a linear regression, we get the following coefficients: , 20, 50, 350,
0.05, - 5
2a) Which answer is correct:
a) For a fixed value of population and average income, a downtown store would on
average have greater sales than a mall store
b) For a fixed value of population and average income, a mall store would on average
have greater sales than a downtown store
c) For a fixed value of population and average income, a downtown store would on
average have more sales than a mall store provided that the average income is
high enough
d) For a fixed value of population and average income, a mall store would on average
have more sales than a downtown store provided that the average income is high
enough
Response: XXX
2B) What is the predicted sales for a downtown store in a city with a
population of 100,000 and an average income of $50,000?
- $XXX
2C) Is this statement true or false and why: Since the coefficient of
the interaction term between population and average income is very
small, there is very little evidence of an interaction effect:
- XXX
2D) Which predictor has the larger impact on sales, income or city population? Explain your answer
- XXX
You are assessing two candidate models (M1 through M4). You try
training the models ten different times with different population
samples and then assessing those models against test partitions by calculating their mean squared errors (MSE). The results of those tests are summarized on the following page.
Complete the figure on the bottom of the following page with one
model for each of the four boxes.
Low Variance High Variance
Low Bias XXX XXX
High Bias XXX XXX
4A) Explain in your own words how k-fold cross-validation is
implemented
- xxx
4B) Provide one advantage and one disadvantage of k-fold cross
validation relative to:
- The validation set approach?
- Advantage: xxx
- Disadvantage: xxx
- Leave-Out-One-Cross-Validation?
- Advantage: xxx
- Disadvantage: xxx
The following pages present a residuals diagram and a residuals histogram for each of six different models. For each model, identify the
apparent problem(s) with the model and provide one technique that
you might use to remediate (correct) the problem.
Residuals Analysis
5A Model 1
Residuals Analysis
Model Issue: xxx Possible remediation: xxx
5A Model 2
Residuals Analysis
Model Issue: xxx Possible remediation: xxx
5A Model 3
Residuals Analysis
Model Issue: xxx Possible remediation: xxx
5A Model 4
Residuals Analysis
Model Issue: xxx Possible remediation: xxx
5A Model 5
Residuals Analysis
Model Issue: xxx Possible remediation: xxx
5A Model 6
Residuals Analysis
Model Issue: xxx Possible remediation: xxx