作业Network | network | 代做AI – Big Data – Data Science – Final Assignment – January 2022

Big Data – Data Science – Final Assignment – January 2022

作业Network | network | 代做AI – 这是一个关于Big Data的题目, 主要考察了关于Big Data的内容,是一个比较经典的题目, 涉及了Big Data/network/AI等代写方面

network代写 代写计算机网络

May 2022

Part 1 – Data Analytics Questions

1.

a.

Binary classification can solve the classification task which have two class labels. The famous example is E-mail spam detection. People often consider one state is normal (positive) while another state (negative) is abnormal. For example, predicting whether a patient has cancer. The positive state is the patient has cancer, negative state means not. Binary classification can address data science questions which involve divide sample into two classes. For instance:

  • E-mail spam detection
  • Cancer detection
  • Flight delays
In addition, Multi-Class classification can be decomposed into multiple Binary classification tasks.

b.

Article: International evaluation of an  AI system for breast cancer screening.
Determine whether a patient has breast cancer. This is a binary classification problem.

i.

Opportunity: breast cancer at earlier stages of the disease can be identified using X-rays. This can greatly improve the survival rate. Therefore, the use of this technology can achieve greater success in medicine. Challenge: However, doctors tend to trust their own results when AI diagnoses differ from those of doctors. This may affect its use. At the same time, people do not believe in AI, especially in the vital medical field.

ii.

False Positive Rate, False Negative Rate: these performance measures are essential, particularly for this question. Because cancer patients are diagnosed as negative, treatment can be delayed. Diagnosing a normal person as positive results in greater irritation to the patient. AUC-ROC also has been used to measure the performance of the AI system. At the same time, workload is reduced, its more conducive to practical applications.

Part 2 – Data Analytics – Azure Task

a.

i.

I tried five models and did the same preprocessing for all of them.
  • First convert the data to a Dataset object of Azure.
  • Convert the country column to numeric.
  • Replace missing values with the column mean.
  • Split the data into 80% training set and 20% test set.

Model 1: Linear Regression

Figure 1: Linear Regression.

Parameters: L2 regularization weight: 0. 001 Result:

Figure 2: Linear Regression Result.

Model 2: Neural network Regression

Figure 3: Neural Network Regression.

Parameters: Number of hidden nodes: 20 Learning rate: 0.005 Number of learning iteration: 100 The initial learning weight: 0. 01 Result:

Figure 4: Neural Network Regression Result.
Model 3: Boosted Decision Tree Regression
Figure 5: Boosted Decision Tree Regression.

Parameters:

Figure 6: Parameter values of Boosted Decision Tree Regression.

Result:

Figure 7: Boosted Decision Tree Regression Result.

Model 4: Bayesian Linear Regression

Figure 8: Bayesian Linear Regression.

Parameters: Regularization weight: 1 Result:

Figure 9: Bayesian Linear Regression Result.

Model 5: Poisson Regression

Figure 10: Poisson Regression.
Parameters:
Using default parameter values.
Result:
Figure 11: Poisson Regression Result.

ii.

I implement the Bayesian Linear Regression since it has the lowest Root Mean Squared Error.

b.

i.

Figure 12: Bayesian Linear Regression Model using Fisher Linear Discriminant Analysis.

I use Fisher Linear Discriminant Analysis to do feature selection and I dont need to choose the number of features. After feature selection, the RMSE is:

Figure 13: Result of Bayesian Linear Regression Model using Fisher Linear Discriminant Analysis.

ii.

I used Fisher Linear Discriminant Analysis.