homework | ISE代写 | 商科 - ISE-529 Predictive Analytics - 学霸代写 - CS代写, 程序代写, CS作业代写, 代码代写, CS编程代写, java代写, python代写, c++/c代写, R代写, 算法作业代写, web代写, CS assignment代写, MATH代写, 统计代写, 金融代写, business代写, economic, accounting代写等

ISE-529 Predictive Analytics

homework | ISE代写 | 商科 – 该题目是值得借鉴的ISE代写的题目

Mid-Term Examination July 25, 2022

Instructions

You are to complete the exam by typing your answers into this PowerPoint as indicated.
You will have 90 minutes to complete the exam and submit it to GradeScope (in the same manner as done for homework assignments). Late submissions will be penalized.
The exam is open-book / open-notes. You may consult any resource except another person.
Good luck!

For this problem we will be working with the following dataset:

Linear Model Analysis

First, we create three models using X1, X2, and the combination of X1 & X2 to predict Y:

1A) For the two simple (single-predictor) models, are the predictors X & X2 significant?

1B) For the multiple regression model, which predictors are significant?

1C) How do you interpret what is going on here?

Now we incorporate the categorical variable into the model by creating

a dummy variable Blue and incorporate it into the model as shown:

1D) Does adding this categorical variable to the model improve its overall performance? Why or why not?

1E) Looking at this color-coded scatterplot of X1 vs Y, do you see any indication of an interaction effect between X1 and X3? Why or why

not?

1E) Looking at these model results, do you see any indication of an interaction effect between X1 and X3? Why or why not?

After completing your modeling analysis, you decide to use the model

shown below:

1F) Write out the algebraic expression for this model (you do not need

to include the error term):

1G) Write out the simplified algebraic expression for this model for the

Blue observations

1H) Write out the simplified algebraic expression for this model for the

Red observations

We have developed a model to predict the sales (in thousands of dollars) at a new store our company may decide to open in a new city and we define and fit a model with five predictors:

: Population of the city (in thousands of people)
: Average income of the city (in thousands of dollars per adult)
: Type of store (1 for downtown store, 0 for a mall store)
: Interaction between population and average income (in thousands)
: Interaction between average income (in thousands) and store type

In the cities we are evaluating, the average income is generally less than $100,000 and the cities are in
the size range of 0  500,000 people

After fitting this model using a linear regression, we get the following coefficients: , 20, 50, 350,
0.05, - 5

2a) Which answer is correct:

a) For a fixed value of population and average income, a downtown store would on
average have greater sales than a mall store
b) For a fixed value of population and average income, a mall store would on average
have greater sales than a downtown store
c) For a fixed value of population and average income, a downtown store would on
average have more sales than a mall store provided that the average income is
high enough
d) For a fixed value of population and average income, a mall store would on average
have more sales than a downtown store provided that the average income is high
enough
Response: XXX

2B) What is the predicted sales for a downtown store in a city with a

population of 100,000 and an average income of $50,000?

$XXX

2C) Is this statement true or false and why: Since the coefficient of

the interaction term between population and average income is very

small, there is very little evidence of an interaction effect:

2D) Which predictor has the larger impact on sales, income or city population? Explain your answer

You are assessing two candidate models (M1 through M4). You try

training the models ten different times with different population

samples and then assessing those models against test partitions by calculating their mean squared errors (MSE). The results of those tests are summarized on the following page.

Complete the figure on the bottom of the following page with one

model for each of the four boxes.

Low Variance High Variance

Low Bias XXX XXX

High Bias XXX XXX

4A) Explain in your own words how k-fold cross-validation is

implemented

4B) Provide one advantage and one disadvantage of k-fold cross

validation relative to:

The validation set approach?
- Advantage: xxx
- Disadvantage: xxx
Leave-Out-One-Cross-Validation?
- Advantage: xxx
- Disadvantage: xxx

The following pages present a residuals diagram and a residuals histogram for each of six different models. For each model, identify the

apparent problem(s) with the model and provide one technique that