report | scheme代做 | 作业mining | Machine learning代写 | Python | assignment – CSCI433/CSCI933: Machine Learning – Algorithms and

CSCI433/CSCI933: Machine Learning – Algorithms and

report | scheme代做 | 作业mining | Machine learning代写 | Python | assignment – 这道题目是利用Python进行的编程代写任务, 涵盖了report/scheme/mining/Machine learning/Python等程序代做方面, 该题目是值得借鉴的assignment代写的题目

scheme代写 代做scheme

Applications

assignment Problem Set

School of Computing and Information Technology

Introduction

In this assignment the tasks are set to help you hone your practical skills in building and testing Machine learning models, and develop theoretical insights for understanding algorithms. You will design and compare the performance of four regression predictors using the dataset provided. Fur- thermore, you will study and gain insight into the theoretical relationship between the algorithms so as to understand the basis of their performance. The assignment starts by requiring you to complete a reading and practice exercise of three chapters of the book by G eron (2019, chp. 1, 2, & 4). A copy of the book is on Moodle for your personal educational use in this subject only. By the end of this preliminary exercise you should understand:

  1. how to install Python and necessary libraries on your personal computer;
  2. key practical issues involved in building a machine learning model;
  3. the pipeline of end-to-end machine learning project;
  4. how to load, clean, wrangle, visualize and understand data as an essential initial step in building a practical predictive model;
  5. how to build a simple regression model.

What needs to be done

  1. Read, study and understand the three chapters of the book (G eron, 2019, Chp. 1, 2, & 4). Ensure you write and run the associated codes in the chapters.
  1. The popular Housing dataset is provided along with this specification in a .zip archive file. It contains training dataset and test dataset. Also included in the archive file is the description of the variables (features) in the dataset. Ensure that you really understand the organisation of the dataset. This is absolutely important – check the size, shape, etc.
  2. Using Python programming language and the scikit-learn machine learning library, implement the following regression models on the housing dataset:(i) ordinary least squares; (ii) ridge regression; (iii) PCA-regression, (iv) elastic-net regression. Your model is to predict the sale price of houses
  3. Write a report on your experiments and the performance evaluation of the models.
  4. Your report will include a section that describes mathematically, the connection between ordi- nary least squares regression, ridge regression and PCA-regression.
  5. Your report will be presented in a conference paper format (see accompanying template) and should detail your understanding of theory of the techniques and a succinct description of experiments. You will describe the techniques in your own words with appropriate equations. When you write an equation, the meaning of the symbols must be explained as well as the intuition behind the equation itself. Your report MUST not be more than four (4) pages in the format specified by the template. The 4-page count excludes the list of references.
  6. You may need to look at some of the books available on Moodle site of this subject for more insight.
  7. Please, appropriately cite any other paper or book you have read in gaining deeper understand- ing of the concepts and methods.

What needs to be submitted

  • You will prepare a zip or rar file containing your report (4-page PDF file) and Python code (named : evalregression.py) file.
  • Your code must run from command line as:
python3 eval_regression.py
and write results indicating that your code works (e.g. prediction errors for each method) to
standard output (stdout).
  • The report should be typed (or typeset using LaTeX) with 12-point font, and with spacings as specified in the template. Submitted report MUST be a PDF file. Any WORD document should have been converted to PDF before submission. Non-PDF reports will not be marked.
  • Submit the zip or rar via Moodle dropbox provided on or before the deadline.

Report marking scheme

Your report should be according to the following format (i.e. headings):

Title (5 marks)- Give your report a nice title and write your names and student number. See template.

Introduction (5 marks)- Describe the problem of regression-based prediction. Provide some of the importance of this scheme and the role played in practical machine learning.

Theory and properties of predictors (40 marks) – Describe the four predictors you have tested in your experimentation. This is very important because it shows how well you understand the properties of the predictors. It is expected that you will write mathematical equations that describe the predictor models. The marks awrded to this section gives an indication of the amount of work expected.

Theoretical links (10 marks)- Briefly derive the mathematics supporting the theoretical link amongst ordinary least squares regression, ridge regression and PCA-regression. Highlight the implications of varying the parameters.

Data preparation (20 marks)- Describe the data in your own words and highlight various statis- tics (mean, variance, etc.) along with any significant observation that could be gleaned from the data. You may include some graphs. But they must be described in your report (4 pages may not givee you room!). Describe the various methods and implications of the data prepa- rations you undertook. Note that this is very important as it would have significant impact on the accuracy obtained from your predictor. You should discuss how you split the data for training, validation and testing.

Experiments and evaluation (20 marks)- Describe the experiments you carried out demon- strate your understanding of the models/algorithms and justify the methods of performance evaluation you have adopted. State the comparative evaluation estimates and justify the dif- ferences. This section definitely requires that you show the results in a table or graph.

Discussion and conclusions (25 marks) - You are required to reflect and write about the dif-
ferences amongst the various predictor models relative to their parameters, amount of data
required for training, nature/format of data required and the accuracy obtained. In addition,
you are required to reflect and describe any significant trend/observation you discovered with
regards to what features may be dominant in deter mining the sale price of a house. For exam-
ple, for a given house type, is there a subgroup of features that are more likely to fetch high
sales value?

References

G eron, A. (2019). Hands-on machine learning with scikit-learn, keras, and tensorflow (2nd ed.). Sebastopol, CA 95472: OReilly Media, Inc.