# 作业Network | network | 代做AI – Big Data – Data Science – Final Assignment – January 2022

### Big Data – Data Science – Final Assignment – January 2022

#### a.

Binary classification can solve the classification task which have two class labels. The famous example is E-mail spam detection. People often consider one state is normal (positive) while another state (negative) is abnormal. For example, predicting whether a patient has cancer. The positive state is the patient has cancer, negative state means not. Binary classification can address data science questions which involve divide sample into two classes. For instance:

• E-mail spam detection
• Cancer detection
• Flight delays
``````In addition, Multi-Class classification can be decomposed into multiple Binary classification tasks.
``````

#### b.

``````Article: International evaluation of an  AI system for breast cancer screening.
Determine whether a patient has breast cancer. This is a binary classification problem.
``````

#### i.

Opportunity: breast cancer at earlier stages of the disease can be identified using X-rays. This can greatly improve the survival rate. Therefore, the use of this technology can achieve greater success in medicine. Challenge: However, doctors tend to trust their own results when AI diagnoses differ from those of doctors. This may affect its use. At the same time, people do not believe in AI, especially in the vital medical field.

#### ii.

False Positive Rate, False Negative Rate: these performance measures are essential, particularly for this question. Because cancer patients are diagnosed as negative, treatment can be delayed. Diagnosing a normal person as positive results in greater irritation to the patient. AUC-ROC also has been used to measure the performance of the AI system. At the same time, workload is reduced, its more conducive to practical applications.

#### a.

i.

``````I tried five models and did the same preprocessing for all of them.
``````
• First convert the data to a Dataset object of Azure.
• Convert the country column to numeric.
• Replace missing values with the column mean.
• Split the data into 80% training set and 20% test set.

Model 1: Linear Regression

``````Figure 1: Linear Regression.
``````

Parameters: L2 regularization weight: 0. 001 Result:

``````Figure 2: Linear Regression Result.
``````

Model 2: Neural network Regression

``````Figure 3: Neural Network Regression.
``````

Parameters: Number of hidden nodes: 20 Learning rate: 0.005 Number of learning iteration: 100 The initial learning weight: 0. 01 Result:

``````Figure 4: Neural Network Regression Result.
``````
``````Model 3: Boosted Decision Tree Regression
``````
``````Figure 5: Boosted Decision Tree Regression.
``````

Parameters:

``````Figure 6: Parameter values of Boosted Decision Tree Regression.
``````

Result:

``````Figure 7: Boosted Decision Tree Regression Result.
``````

Model 4: Bayesian Linear Regression

``````Figure 8: Bayesian Linear Regression.
``````

Parameters: Regularization weight: 1 Result:

``````Figure 9: Bayesian Linear Regression Result.
``````

Model 5: Poisson Regression

``````Figure 10: Poisson Regression.
``````
``````Parameters:
Using default parameter values.
Result:
``````
``````Figure 11: Poisson Regression Result.
``````

ii.

``````I implement the Bayesian Linear Regression since it has the lowest Root Mean Squared Error.
``````

#### b.

i.

``````Figure 12: Bayesian Linear Regression Model using Fisher Linear Discriminant Analysis.
``````

I use Fisher Linear Discriminant Analysis to do feature selection and I dont need to choose the number of features. After feature selection, the RMSE is:

``````Figure 13: Result of Bayesian Linear Regression Model using Fisher Linear Discriminant Analysis.
``````

ii.

``````I used Fisher Linear Discriminant Analysis.
``````