代写统计 | math代写 | Machine learning | assignment – Math 308: Fundamentals of Statistical Learning

Math 308: Fundamentals of Statistical Learning

代写统计 | math代写 | Machine learning | assignment – 这道题目是利用Machine learning进行的编程代写任务, 是比较有代表性的report/math/Machine learning/统计等代写方向, 这个项目是assignment代写的代写题目

机器学习代写 代做机器学习 ai代做 machine learning代写 ML代做

assignment 3

(Q1) Multiple Correspondence Analysis: We will look at a canonical dataset on breast cancer from the Machine learning comm unity for this question which we previously looked at in Assignment 2 for Bivariate Correspondence Analysis. The data set is publicly available from UCI Machine Learning repository and has the following variables (description can be found here).

  • Recurrent event ("RecEv"): whether the patient experience cancer recurrence or not
  • Age ("AgeGrp"): age group of the patient at the time of diagnosis;
  • Menopause ("Meno"): whether the patient is pre- or postmenopausal at time of diagnosis ("lt40" means menopause occurred before 40, "ge40" means at or after 40)
  • Tumor size ("TumSize"): the greatest diameter (in mm) of the excised tumor;
  • Inv-nodes ("InvNodes"): the number (range 0 – 39) of axillary lymph nodes that contain metastatic breast cancer visible on histological examination;
  • Node caps ("NodeCaps"): if the cancer does metastasize to a lymph node, although outside the original site of the tumor it may remain "contained" by the capsule of the lymph node. However, over time, and with more aggressive disease, the tumor may replace the lymph node and then penetrate the capsule, allowing it to invade the surrounding tissues;
  • Degree of malignancy ("DegMal"): the histological grade (range 1-3) of the tumor. Tumors that are grade 1 predominantly consist of cells that, while neoplastic, retain many of their usual characteristics. Grade 3 tumors predominately consist of cells that are highly abnormal;
  • Breast side ("Side"): breast cancer may obviously occur in either the left or right breast;
  • Breast quadrant ("Quad"): the breast may be divided into four quadrants, using the nipple as a central point;
  • Irradiation ("Irrad"): radiation therapy is a treatment that uses high-energy x-rays to destroy cancer cells.
library (tidyverse)
library (kableExtra)
library (ggpubr )
library (FactoMineR)
breast_cancer<- read_csv ("breast_cancer_data.csv",col_names=FALSE, col_types = cols ())
names (breast_cancer)<- c ("RecEv","AgeGrp","Meno","Size","InvNodes",
"NodeCaps","DegMal","Side","Quad","Irrad")
### Remove missing obs
breast_cancer<-breast_cancer %>% filter (Quad!="?",NodeCaps!="?")
1

math 308: Winter 2023 Shomoita Alam

RecEv AgeGrp Meno Size InvNodes NodeCaps DegMal Side Quad Irrad
no-recurrence-events AG30-39 premeno 30-34 IN0-2 no 3 left left_low no
no-recurrence-events AG40-49 premeno 20-24 IN0-2 no 2 right right_up no
no-recurrence-events AG40-49 premeno 20-24 IN0-2 no 2 left left_low no
no-recurrence-events AG60-69 ge40 15-19 IN0-2 no 2 right left_up no
no-recurrence-events AG40-49 premeno 0-4 IN0-2 no 2 right right_low no
no-recurrence-events AG60-69 ge40 15-19 IN0-2 no 2 left left_low no
breast_cancer<-breast_cancer %>% mutate (AgeGrp= paste ("AG",AgeGrp,sep=""),
InvNodes= paste ("IN",InvNodes,sep=""))
head (breast_cancer) %>% kable (.) %>% kable_styling ()
For this part, we will focus on all of the variables in a single multiple correspondence analysis. Note
that in order to use MCA, we need to make one more change to the dataset:
## Convert all columns to factors
breast_cancer<-breast_cancer %>% mutate_all (~ factor (.))
a) (20 points) Conduct a multiple correspondance anaylsis for this data, being sure to complete the
following tasks:
  • report the table of eigenvalues for the first 5 components and explain how many components you think are sufficient to analyze the data.
  • Generate a factor map for the first two dimensions of the correspondance analysis (regardless of what your answer is to the first bullet point). Give a summary of which levels of which variables are most strongly associated with each of the first two dimensions and how you made your decisions.
b) (10 points) Note that of particular interest is which variables are related to cancer recurrence.
Does the recurrence variable load highly on any of the dimensions that you discussed in part (a)?
If so, explain which dimensions those are and for each of those dimensions, indicate what other
variable levels also load highly on those dimensions.

2