random forest
report作业  Artificial  project  Objective – 本题是一个利用random forest进行练习的代做, 对ai相关算法的流程进行训练解析, 涵盖了report/Artificial/Objective等程序代做方面, 这个项目是project代写的代写题目
As you may be aware of, in your final project deliverable you are currently required to include an implementation correctness component for your method. In its current format, the proposed process for demonstrating that your fromscratch implementation is correct is by closely comparing its behavior to some offtheshelf implementation of the same method. However, we have realized that due to various optimizations and modifications that many offtheshelf implementations may contain, it can sometimes be hard to provide a meaningful comparison when the from scratch implementation is based on a simplified vanilla version of a method.
Therefore, we have decided to modify this process to demonstrating that your method performs correctly by inspecting its behavior on a specially designed Artificial dataset. You can find this dataset both in the "Important Documents" module and also attached in this announcement. It contains 17 datapoints with 2 features along with their designated class. Specifically, the implementation correctness modifications are as follows:
 Instead of submitting the implementation correctness report as a separate document from the final report, please include a separate section at the end of your final report in which you include the implementation correctness component. In other words, no implementation correctness report pdf needs to be submitted.
 Depending on the method you implement you will have to perform the following: – kNN: For k=3, no data preprocessing, and a test datapoint [1.4 , 3] 1. show a scatterplot of the dataset including the test datapoint where – points in different classes are shown in different colors – the test data point is shown in different color than all other datapoints – based on Euclidean distance you denote the 3 closest neighbors of the test datapoint with filled circles and all other points with open circles 2. report which class the test data point is classified as and why 3. repeat steps 1 and 2 by using Manhattan distance instead – Decision Tree: Explain 1. the order of the features selected by the tree and why it is correct 2. which feature value each split is based on and why it is correct – Random Forest: 1. Follow the steps above to demonstrate the correctness of your individual base trees as if you were implementing just a single decision tree
 Construct a random forest with 20 base trees each of which is trained on 12 randomly selected datapoints and on all features and then consider the test datapoint [4, 4] and then report I. the number of base trees that classify it as class 1 and class 2, respectively II. the final classification that the random forest produces. Explain if the classification makes sense and why.
 Singlelinkage:
 show the scatterplot of the dataset with a unique integer number on top of each point
 show the corresponding dendrogram where on each leaf the same integer number as the corresponding point of the scatterplot is indicated
 explain if and how exactly we can obtain clusters identical to the groundtruth clusters shown in the dataset
 DBSCAN: for min_pts = 2 and and eps = 1.
 create the scatterplot of the dataset with an integer on top of each point showing the order in which it was visited by the algorithm. Explain why this order is correct.
 denote with filled circles the core points and with empty circles the noncore points. Explain why each core and noncore point was designated as such.
 use different colors for points belonging to different clusters as obtained by DBSCAN (do not color based on the groundtruth clusters given in the dataset).
 Spectral Clustering:
 Create from scratch the adjacency matrix of the unweighted undirected similarity graph where node i and node j are connected with an edge if datapoint i is among the 2 nearest neighbors of datapoint j, OR datapoint j is among the 2 nearest neighbors of datapoint i
 Calculate the spectral embeddings of dimension 2 based on the Normalized Laplacian, and report them numerically in a latex table in your report
 Explain if and why these embeddings provide any benefit in terms of clustering the original datapoints
as compared to clustering the original datapoints
directly
 Run kmeans for k=2 and show a plot such that its horizontal axis corresponds to iterations of kmeans and its vertical axis indicates the Objective function of kmeans at that iteration.
Please denote the updates of the
centroids and the updates of the
assignments as different iterations. For
example, if iteration i corresponds to an
update of cluster assignments, then
iteration i+1 should correspond to an
update of the centroids where the cluster
assignments remain the same as in
iteration i.
 Show a scatterplot showing the spectral embeddings (drawn as circles) and color them based on the cluster that kmeans assigned them in the calculated centroids (drawn as x’s) having the same color as the cluster they represent
o Support Vector Machines (SVM): Consider the test datapoint [4, 4]

Show a scatterplot of the dataset including the test datapoint where o points in different classes are shown in different colors o the test data point is shown in different color than all other datapoints

Train your SVM with linear kernel. If no issues come up proceed to step 3. Otherwise, explain what went wrong and why, and proceed directly to step

While training your SVM show a plot such that o its horizontal axis corresponds to iterations of optimization updates o and its vertical axis indicates the value of the objective function of SVM at that iteration o Explain if and why the behavior shown in this plot is correct or not

After training is finished, report the calculated values of w and b (as defined in the course slides)
If you implemented a hardmargin
SVM, then in your scatterplot denote
with filled circles ALL the support
vectors, and with empty circles the rest
of the points
in your scatterplot draw the separating
hyperplane
Show exactly how your trained SVM
decides the class of the test datapoint
and what class it assigns it in
 Define the mapping [a, b] > [a, b, 2ab/(a+b)] and transform all datapoints accordingly. Then repeat steps 14 for the transformed dataset. What do you observe? Explain if and why your SVM benefitted from this transformation.