# homework代写 | redis | Algorithm | assignment – CS 165B Machine Learning, Winter 2019

### CS 165B Machine Learning, Winter 2019

homework代写 | redis | Algorithm | assignment – 这是一个关于Algorithm的题目, 主要考察了关于Algorithm的内容,是一个比较经典的题目, 是比较有代表性的redis/Algorithm等代写方向, 这个项目是assignment代写的代写题目

Notes:

• This assignment is to be done individually. You may discuss the problems at a general level with others in the class (e.g., about the concepts underlying the question, or what lecture or reading material may be relevant), but all final answers must be your own work. You are expected to maintain the utmost level of academic integrity in the course.
• Be sure to re-read the Policy on Academic Integrity on the course website.
• Be aware of the late policy in the course website, i.e., each student has three penalty-free late day for the whole quarter. Beyond that, late submissions are penalized (10% of the maximum grade per day).
• Any updates or corrections will be posted on the Assignments page (of the course website), so check there occasionally.
• Please bring a hardcopy of your homework to class on the due date. (Make sure the writing is clear and easily readable, with good contrast between the writing and the page.)

#### 1 EM for Gaussian Mixture Model [50 points]

In this question we consider clustering 1D data with a mixture of 2 Gaussians using the EM algorithm. You are given the 1-D data pointsx= [1 10 20].

#### M step

Suppose the output of the E step is the following matrix:

###### 0 1

where entryRi,cis the probability of observationxibelonging to clusterc(the responsibility of cluster cfor data pointi). You just have to compute the M step. You may state the equations for maximum likelihood estimates of these quantities (which you should know) without proof; you just have to apply the equations to this data set. You may leave your answer in fractional form. Show your work.

``````1.[5 points]Write down the likelihood function you are trying to optimize.
2.[10 points]After performing the M step for the mixing weights 1 , 2 , what are the new values?
3.[10 points]After performing the M step for the means 1 and 2 , what are the new values?
4.[10 points]After performing the M step for the standard deviations 1 and 2 , what are the new
values?
``````
##### E step

Now suppose the output of the M step is the answer to the previous section. You will compute the subsequent E step.

``````1.[5 points]Write down the formula for the probability of observationxibelonging to clusterc.
2.[10 points]After performing the E step, what is the new value ofR?
``````

#### 2 Programming Question (clustering with K-means) [45 points]

The programming portion of this assignment can be found atkmeans.zip. Follow the instruction in the README file. A code skeleton for the programming question is supplied in the folder. In class we discussed the K-means clustering algorithm. Your programming assignment this week is to implement the K-means Algorithm on country survey data (also included in the zip file):

##### 2.1 The Data

The data comes from a UN survey on peoples political priorities. We have aggregated the data across countries in the filecountry.csv. Each row lists the relative importance for each priority (between 0 and 1). You will cluster the data to find which countries are similar based on what the populations of those countries care about.

##### 2.2 The algorithm

Your algorithm should be implemented as follows:

1. Selectkstarting centers that are points from your data set. You should be able to select these centers randomly or have them given as a parameter.
2. Assign each data point to the cluster associated with the nearest of thekcenter points.
3. Re-calculate the centers as the mean vector of each cluster from (2).
4. Repeat steps (2) and (3) until convergence or iteration limit.

Define convergence as no change in label assignment from one step to anotheroryou have iterated 20 times (whichever comes first). Please count your iterations as follows: after 20 iterations, you should have assigned the points 20 times.

##### 2.3 Within group sum of squares

The goal of clustering can be thought of as minimizing the variation within groups and consequently maxi- mizing the variation between groups. A good model has low sum of squares within each group. We define sum of squares in the traditional way. LetCkbe thekth cluster and letkbe the empirical mean of the observationsxiin clusterCk. Then the within group sum of squares for clusterCkis defined as:

``````SS(k) =
``````
``````iCk
``````
``````|xiCk|^2
``````
``````Please note that the term|xiCk|is the euclidean distance betweenxiandCk, and therefore should
``````

be calculated as|xiCk|=

``````d
j=1(xijCkj)
``````

(^2) , whe redis the number of dimensions. Please note that that term is squared inSS(k). If there areKclusters total then the sum of within group sum of squares is just the sum of allKof these individualSS(k) terms.

##### 2.4 Questions
1. [10pts] The values of sum of within group sum of squares fork= 5,k= 10 andk= 20. Please start your centers with the firstkpoints in the dataset. So, ifk= 5, your initial centroids will be the five countries: Afghanistan, Albania, …
2. [5pts] The number of iterations that k-means ran fork= 5, starting the centers as in the previous item. Make sure you count the iterations correctly. If you start with iterationi= 0 and ati= 3 the cluster assignments dont change, the number of iterations was 4, as you had to do step 2 four times to figure this out.
1. [15pts] A plot of the sum of within group sum of squares versuskfork= 150. Please start your centers randomly (choosekpoints from the dataset at random).
2. [5pts] Based on your plot, choose a best value ofkfor this dataset. Explain why you chose this value.
3. [5pts] For your optimal value ofk, examine the resulting clusters, and also how their clusters centers differ from the average over all countries. What general trends to you see in this data? For example, how well balanced are the clusters? Do the countries in each cluster appear to be related?
4. [5pts] Pick a country you are interested in. It could be the country you are from, somewhere you have visited, or a country you would like to learn more about. What cluster does this country belong to? What sets this cluster apart from other countries? Are the countries in this cluster related somehow (geographically, politically, economically)? Are there any unexpected countries in this cluster?