report | project | ai代写 | data mining代做 | big data代做 | assignment – MAST30034: Applied Data Science

MAST30034: Applied Data Science

report | project | ai代写 | data mining代做 | big data代做 | assignment – 这个题目属于一个关于数据分析的题目,设计了report的等代写任务, 是有一定代表意义的data science等代写方向, 该题目是值得借鉴的assignment代写的题目

ass代做 assignment代写 代写assignment

MAST30034: Applied Data Science

assignment 2

project Overview

The aim of this project is to make a qualitative analysis of the New York City Taxi and Limousine Service Trip Record Data. The data set covers trips taken in various different types of licensed taxi and limousine services in the New York City area. The data is freely available to download fromhttps:// The whole data set is large, covering many years, you are not expected to analyse it all, only a subset that you are free to choose. You are free to choose the tools and techniques you use to perform the analysis. You will be required to prepare a self-contained report of up to 15 pages detailing the steps taken in performing your attributes analysis and the output of modelling and analysis.

Project Details

You are free to select a period of time to analyse, as well as the type of licensed taxi you wish to focus on, however, large scale of data is preferred. You are also free to select attributes you want to study. Your report should explain and justify your selection decision. The first stage of the project is to access and report the target data via descriptive statistics for a group of selected attributes to characterise the data and make a clear research goal. Following that, you should build a statistical model to explain the relation between your input variables and response variables (both types of variable may be chosen by you from the data). Transformation of the data or using external data set may result in higher marks, if it is clearly justified. You should then refine your model. For example, you may improve your model by investigating the correlation of

your selected attributes, and ranking the importance of your input variables based on clearly self-defined criterion. Justified your selection of the final model. You are also expected to highlight key findings based on your results and note findings that you believe are important or unanticipated.


Your report should be a maximum of 15 pages and cover at least the following items:

  • Identify the research problem and attributes you want to study.
  • Choose appropriate data and describe the procedures for processing and analysing the data.
  • Interpretation of results: Description of trends, comparison of groups, or relationships among your chosen attributes.
  • Identify the most important attributes based on certain criterion and your chosen response.
  • Make recommendations or prediction based on your results, or actions to be taken in practice to further improve the performance.


Your report will be assessed across a number of areas, including:

  • Quality of your research problem
  • Justification of data and attribute selection
  • Quality of your model and attribute relations
  • Quality and clarity of interpretation of results
  • Quality and clarity of report

Submission details

Submissions should be made via Turnitin on the LMS.

  • Late submissions will incur a deduction of 2 marks per day (or part thereof).
  • If you submit late, you MUST email the subject co-ordinator, Chris Culnane, [email protected].

Extension policy: If you believe you have a valid reason to require an extension you must contact the subject co-ordinator, Chris Culnane ccul- [email protected] at the earliest opportunity, which in most instances should be well before the submission deadline. Requests for extensions are not automatic and are considered on a case by case basis. You will be required to supply supporting evidence such as a medical certificate. In addition, your git log file should illustrate the progress made on the project up to the date of your request. Plagiarism policy:You are reminded that all submitted project work in this subject is to be your own individual work. Automated similarity checking software will be used to compare submissions against each other and known public source code. It is University policy that cheating by students in any form is not permitted, and that work submitted for assessment purposes must be the independent work of the student concerned.

Further Hints

  • Using external data set may result in higher marks.
  • Sub-sampling may help you to increase the scope of data you can cover.
  • Explain your handling of missing/unreasonable data and why any miss- ing data does not undermine the validity of your analysis. You should report the size of data that has been removed.
  • When you are trying to make comparisons, make sure your measure- ment is of the same scale.
  • You may want to try different methods for your analysis.
  • Always tell the reader what to look for in tables and figures. Be as factual and concise as possible in reporting your findings.
  • If necessary, define unfamiliar concepts and provide the appropriate background information to aid your finding.