Big Data – Data Science Final Assignment January 2022
web作业 | assignment作业 – 该题目是一个常规的Big Data的练习题目代写, 涵盖了Big Data等方面, 这是值得参考的assignment代写的题目
Submission Deadline June 2 , 202 2 (23:59)
!!!ATTENTION – assignment DETAILS!!!
Please answer all the questions below. The answers should be submitted via Blackboard in a pdf or MS Word format. The document should include:
- for part 1 detailed answers to the question + link to the article.
- for part 2 : detailed answers
- supporting screenshots from AzureML
- link to the experiment on Azure Gallery
- a description of the chosen model Data all the datasets are on Blackboard. Details on the datasets are included in the relevant questions.
- Good luck!
Part 1 Data Analytics Questions ( 30 points)
- Binary classification is one of the common data science tasks. a. Explain the type of data science question the binary classification can address. b. Find a news article or web article with an example of real case that binary classification could address.: i. Explain the business opportunity/challenge. ii. What was the performance measure that was used (if not mentioned in the article, describe the measure that will suit this task)?
Part 2 Data Analytics Azure Task ( 70 points)
- Congratulations! You landed your dream job as a top manager in BestCars, one of the largest car companies in the world. Your first task is to predict your company monthly sales in different markets (different countries). Your data include attributes about the country, and date and online activity data. Below is the description of the attributes: Variable Description Year Year of sales Month Month of sales new.site Indicator if a new website was used country Country of sales language Website language Sales Total monthly sales WebVisits Number of monthly visits to the website KPI1 Monthly KPI 1 count KPI2 Monthly KPI 2 count KPI3 Monthly KPI 3 count KPI4 Monthly KPI 4 count avg_trends Google trends data sum_local_wiki_visits Wikipedia total monthly visits avg_local_wiki_visits Wikipedia average monthly visits
Your goal is to build a model to predict the Sales variable. Your dataset is the file Sales_kpis_data.csv attached to the assignment and also appear in the Final Assignment Datasets folder. The file includes 348 rows on monthly data from 9 countries. a. Test different models (25 points): i. Run different prediction models in Azure ML, for each model, describe the data preparation, the model you run (provide print screens), the different input parameters, and show the out-of-sample accuracy of your models? ii. Which model will you implement for predicting sales?
b. Feature Selection (15 points)
Run additional prediction model that includes the feature selection option in Azure ML. The goal is to improve the accuracy of your model compared to the previous results (2a). Describe the model and explain: i. How did you choose the number of features? ii. Which methods did you use?