### test

homework | mining代做 – 这是一个mining面向对象设计的test, 考察相关的理解

**SEEM4113 Data mining Final Examination Semester B 2020-**

**This exam consists of 4 problems and 5 pages (including this page). Answer ALL problems.**

*Note:*

*1. In this exam, the following items are allowed:* *(a) printed copies of the lecture notes and the homework solutions I have* *had delivered to you via CANVAS;* *(b) your own hand-written notes;* _(c) a university-approved calculator to aid your calculations.

- All other materials and aid are not permitted during the whole exam.
- Do your work using pen/pencil/paper_. Do not use word processor to type your answers. _4. Internet search is not permitted during the whole exam.
- No communication related to the exam is allowed between you and any_
*other people during the whole exam. (Exception: You can ask me*_questions via zoom during the exam.) - Open zoom while you are taking the exam. I will communicate with you_ _via zoom in the case of exam questions needing clarification.
- (i) The time allowed for this exam is 2 hours._
*(ii) An additional 5 minutes will be added, for you to upload your answer**to CANVAS. ( Thus, you will have a total of 2 hours and 5 minutes to do**the exam and upload your answer to CANVAS.)**(iii) To safeguard against possible delay with CANVAS, you are**encouraged to also submit your answer to me via the e-mail**[email protected] as well.**(iv) For submission made beyond the 2-hour-5- minute deadline, it will be**subjected to mark deduction*(mark deduction = [number of minutes late]2 marks).*8. For your submission: Convert your answers to a pdf file (like what you**have done for your homework submission). You may want to consider**putting your answer sheets one-by-one on a clipboard, take photos, and**then convert them into a pdf file using free mobile phone software such as**CamScanner. 9.**Double-check that you have submitted the right file to the CANVAS**dropbox! Failure to do so may lead to significant loss of marks.*

```
Violating one or more of the above rules will lead disciplinary actions.
```

*Note: On the 1st page of your answer sheets, write down your student**number for this class (but do not write your name).**For all questions, support your answers with calculations.*

**Question 1**. (27 marks) You are working as a data analyst intern for the Tablet Division of the Delli Corporation. During the first week of your internship, your boss gave you the following small data set: **ID Age Group Income Level self-employed Credit Worthiness Buy Tablet** 1 Young high no fair no 2 Young high no excellent no 3 Middle high no fair yes 4 Old medium no fair yes 5 Old low yes fair yes 6 Old low yes excellent no 7 Middle low yes excellent yes 8 Young medium no fair no 9 Young low yes fair yes 10 Old medium yes fair yes 11 Young medium yes excellent yes 12 Middle medium no excellent yes 13 Middle high yes fair yes 14 Old medium no excellent no

(a) Currently, the Tablet Division is trying to find out whether customers with the following characteristics will likely or not to buy a tablet from Delli:

```
Age Group: Young; Income level: medium; Self-employed: yes;
Credit worthiness: fair.
```

```
Your task Using the naive Bayes method, predict whether such customers
will likely buy a tablet from Delli.
```

(b) Repeat the analysis you have done in part (a), for customers having the following characteristics:

```
Age Group: Young; Income level: medium; Credit worthiness: fair.
```

(c) Suppose that for the Age Group column, your company in fact has the more detailed data: 20, 25, 35, 45, 60, 55, 35, 25, 25, 45, 30, 40, 36, 50 (from ID 1 to ID 14). Thus, the Age Group column is now replaced by the Age column.

```
Your task Use the Gaussian distribution to model the probability density
distribution of the Age column, and apply the naive Bayes method to
predict whether customers with the following characteristics will likely buy a
tablet from Delli:
```

```
Age: 35; Income level: medium; Self-employed: yes; Credit worthiness: fair.
```

**Question 2**. (27 marks)

(a) In collaboration with the Hong Kong Jockey Club, Dr. Evie at the CityU is experimenting with a new technique (code-named Prospector) for testing the presence of a certain chemical in horse urine. Upon testing a specimen, Prospector will output a score, indicating the likelihood of the presence of the chemical (the higher the score, the more likely).

```
As a student helper at Dr. Evies laboratory, you are given the following data:
```

```
Sample ID Prospector Score True label
1 8 Yes
2 5 Yes
3 3 no
4 2 no
5 0 no
6 -3 Yes
7 -4 no
8 -5 no
```

```
Your task Determine (i) the classification accuracy, (ii) the precision rate,
and (iii) the recall rate for each of the following cut-off scores: 4, 1. (That is,
compute (i), (ii) and (iii) when we predict Yes for samples with score 4 or
above. And compute (i), (ii) and (iii) when we predict Yes for samples
with score 1 or above.)
```

(b) Ultimately, PROSPECTOR is tested on 1000 samples (and yes, all the student helpers work day-and-night on this). The true classes of the 1000 samples are such that 50% of them are actually negative, and the remaining ones are actually positive. Meanwhile, the classifier Dr. Evie uses has a classification accuracy of 70% and a recall rate of 60%.

```
Your task -- Compute the corresponding precision rate and false positive rate
of the classifier Dr. Evie uses.
```

**Question 3**. (27 marks)

```
(a) Consider a perceptron with the following weights: w 0 =2, w 1 =1 and w 2 =1. On
the following x 1 - x 2 graph (you can copy it onto your answer sheet), plot the
decision surface, and also label which region the perceptron would predict
class 1 and which region it would predict class 0.
```

```
(b) In addition to ( w 0 =2, w 1 =1 and w 2 =1), consider the following two additional
perceptrons: (i) ( w 0 =100, w 1 =50 and w 2 =50), (ii) ( w 0 =2, w 1 = -1 and w 2 =1).
```

```
Your task For each of (i) and (ii), describe its relationship with respect to
the perceptron ( w 0 =2, w 1 =1 and w 2 =1).
```

```
(c) Consider the following data set with only one attribute x :
```

```
The solid circles represent samples belonging to class 1, while the unfilled
circles represent samples belonging to class 0.
```

```
Your task Construct a single perceptron that can achieve perfect
cl assification for this data set.
```

**Question 4**. (19 marks) Consider the following training data set:

The attributes of the data set are: Shape, Filled-or-not, and Colour.

(a) Compute the entropy of the whole data set above.

(b) Based on the Gini index, construct the full decision tree for this training data set.

(c) Consider the following test data set:

Your task: (i) Apply the decision tree you have obtained in (b) to this test data set. What are the resulting predictions? (ii) What is the resulting classification accuracy?