Math 308: Fundamentals of Statistical Learning
math代写 | 代写Algorithm | 统计代写- 这是一个统计方面的practice, 考察统计相关知识的理解, 包括了math/Algorithm等方面, 这个项目是assignment代写的代写题目
assignment 3
(Q2) In this question, you will be asked to use unsupervised clustering methods for a canonical dataset from the clustering and classification literature. You can access the dataset by installing therrcov package:
# install.package("rrcov")
library (tidyverse)
library (rrcov)
library (GGally)
data (diabetes)
glimpse (diabetes)
## Rows: 145
## Columns: 6
## $ rw <dbl> 0.81, 0.95, 0.94, 1.04, 1.00, 0.76, 0.91, 1.10, 0.99, 0.78, 0.~
## $ fpg <int> 80, 97, 105, 90, 90, 86, 100, 85, 97, 97, 91, 87, 78, 90, 86, ~
## $ glucose <int> 356, 289, 319, 356, 323, 381, 350, 301, 379, 296, 353, 306, 29~
## $ insulin <int> 124, 117, 143, 199, 240, 157, 221, 186, 142, 131, 221, 178, 13~
## $ sspg <int> 55, 76, 105, 108, 143, 165, 119, 105, 98, 94, 53, 66, 142, 93,~
## $ group <fct> normal, normal, normal, normal, normal, normal, normal, normal~
?diabetes
which contains data on 5 physiological measurements taken from 145 non-obsese adult patients for
the purposes of clustering patients based on their diagnoses for diabetes: normal (non-diabetic),
chemical and overt.
Because we will using unsupervised clustering, we will not cluster using the group variable, so for
your clustering algorithms use the following data:
diabetes_no_group <- diabetes %>% dplyr:: select (-group)
str (diabetes_no_group)
## 'data.frame': 145 obs. of 5 variables:
## $ rw : num 0.81 0.95 0.94 1.04 1 0.76 0.91 1.1 0.99 0. ...
## $ fpg : int 80 97 105 90 90 86 100 85 97 97 ...
## $ glucose: int 356 289 319 356 323 381 350 301 379 296 ...
## $ insulin: int 124 117 143 199 240 157 221 186 142 131 ...
## $ sspg : int 55 76 105 108 143 165 119 105 98 94 ...
1
math 308: Winter 2023 Shomoita Alam
a) (10 points) Use thefviz_nbclustfunction to select the optimum number of clusters. Be sure to
clearly state how you chose an appropriate number of clusters without using information from
the group variable.
b) (10 points) With your chosen number of clusters, perform k-means Algorithm to cluster the data
and display them using plots. Depending on the number of clusters you chose, explain in a few
words how the plot was generated.