express | redis | Algorithm代写 | 代做Objective – ASSESSMENT : COMP0078A7UD

UNIVERSITY COLLEGE LONDON

express | redis | Algorithm代写 | 代做Objective – 这道题目是利用redis进行的编程代写任务, 包括了express/redis/Algorithm/Objective等方面

算法代写 代写算法 Algorithm代写 代写Algorithm  算法作业代写

EXAMINATION FOR INTERNAL STUDENTS

MODULE CODE : COMP

ASSESSMENT : COMP0078A7UD

PATTERN

MODULE NAME : Supervised Learning

LEVEL: : Undergraduate ( Masters Level)

DATE : Friday 26-Aug-

TIME : 10:

Controlled-Condition exam: 3 hours exam

You cannot submit your work after the date and time shown on

AssessmentUCL you must ensure to allow sufficient time to upload and

hand in your work

This paper is suitable for candidates who attended classes for this

module in the following academic year(s):

Year

2021/

Duration 3 hours
Additional time for converting handwritten

notes to PDF where applicable^10 minutes^

Upload window^20 minutes^

Total time 3 hours 30 minutes

Additional material
N/A
Special instructions
N/A
Exam paper word
count
N/A

TURN OVER

Supervised Learning, COMP0078 () Late Summer Assessment Period, 21/

Suitable for Cohorts: 21/

Answer ALL EIGHT questions. Notation:Let[m] :={ 1 ,…,m}. We also overload notation so that

[pred] :=
1 p redis true
0 predis false
.

Recommended answer lengths are only for informational purposes, no points will be deducted for shorter or longer answers.

Marks for each part of each question are indicated in square brackets. Standard calculators are permitted.

  1. a. i. Using approximately 2-6 sentences explainempirical error (risk) minimisation (ERM). ii. In what situation is ERM likely to give poor results? Give an example. And what word is used to describe this situation? [5 marks]
b. Can(ordinary linear) least squaresbe interpreted as an example of ERM? Please
explain why or why not.
[5 marks]
COMP0078 1 TURN OVER
  1. This questions concerns kernel methods. For each of the following functionsK:R^2 R^2 Rargue whether they are a valid kernel (i.e. the kernel can be written as an inner product is some feature space) and when the answer is positive derive an associated feature map representation. a. K(x,t) =x>Dt, whereDis the matrix 1 0 0 5
[2 marks]
b. K(x,t) =x>Dt, whereDis the matrix

1 2
2 4
[3 marks]
c. K(x,t) = exp(x 1 t 1 ), wherex 1 is the first component of the vectorxand, likewise,
t 1 is the first component of the vectort.
[2 marks]
d. K(x,t) =x>t(x>t)^2.
[3 marks]
Now prove that ifK:R^2 R^2 Ris a given valid kernel then the following transformed
kernels are also valid:
e. K(Ax,At), whereAis a given 2  2 matrix.
[2 marks]
f. f(x)K(x,t)f(t), wherefis a given real-valued function.
[3 marks]
COMP0078 2 CONTINUED
  1. a. Consider the following form of the dual optimisation for Support Vector Machine training with a linear kernel(K(x,t) =x,t): maximise L() =mi=1mj=1ijyiyjxi,xj subject to 0 iC, i= 1,…,m, m i=1i= 1 and
m
i=1iyi= 0;
If we consider the primal weight vector
w=
m
i=
?ixi,
how can we  express the objectiveL(?)in terms ofw?
[5 marks]
b. State without justification what are the effects of each of the following changes
(individually) to the optimisation of part 3.a):
i. SettingC=
ii. Removing the constraintmi=1iyi= 0
iii. SettingC=and replacing 1 by 2 in the constraintmi=1i= 2
[5 marks]
c. For case iii of part 3.b), what are the values of

i:yi=
i and
i:yi= 1
i?
[5 marks]
COMP0078 3 TURN OVER
  1. a. Given the data set with five examples,
((1,1),+1),((1,1),+1),(( 1 ,1),+1),(( 1 ,1),+1),((0,0),1)
Plot the dataset. Consider training a classifier with Adaboost using decision stumps.
Indicate which example(s) have their weights increased after a single iteration of
boosting. Explain why.
[5 marks]
b. Given a datasetS={(x 1 ,y 1 ),...,(xm,ym)} RdRand positive scale factors
a(0,)dandc(0,)along with offsetsbRdanddRwe define the
transformsfa,b(x) := (a 1 x 1 +b 1 ,a 2 x 2 +b 2 ,...,adxd+bd)andgc,d(y) :=cy+
d. Consider the regression tree corresponding to the greedy recursive partitioning
procedure with respect to square error on the datasetS.
i. Suppose we first transform the dataset to{(fa,b(x 1 ),y 1 ),...,(fa,b(xm),ym)}
then compute the regression tree. Will the computed regression tree change if
so how?
ii. Suppose we first transform the dataset to{(x 1 ,gc,d(y 1 )),...,(xm,gc,d(ym))}
then compute the regression tree. Will the computed regression tree change if
so how?
Give an argument justifying your answers to part i and ii
[5 marks]
COMP0078 4 CONTINUED
  1. Full feedback.Consider the following extension of the expert setting tokclasses,
Protocol:
Natureselectsx 1 ,x 2 ,...,xT[k]nandy 1 ,y 2 ,...,yT[k].
Fort= 1ToTDo
Receive expert predictions xt[k]n
Predict yt[k]
Receive label yt[k]
Design a randomised  Algorithm with a good upper bound on the expected regret,
E
[T
t=
[yt 6 =yt]
]
mini[n]
T
t=
[i 6 =yt]
give your algorithm design and argument for the upper bound.
[10 marks]
COMP0078 5 TURN OVER
  1. Define, h(x) =
1 dxd+d 1 xd^1 ++ 0  0
 1 dxd+d 1 xd^1 ++ 0 < 0
whereh:R{ 1 , 1 }and then define the hypothesis class,
Hd:={h:Rd+1}.
a. Suppose you are given a dataset for which there exists anhHdwhich fits the data
with zero error. Describe a technique which can find anh Hdwith zero error.
(Expected answer length 1-4 sentences).
[5 marks]
b. We consider the following learning model (as you may recall from our lectures)
Model :Data is sampled IID from a distributionDoverX YwithY={ 1 , 1 }.
Theexpected error(AKAgeneralisation error) of a functionh:X Yis
LD(h) =D({(x,y) :h(x) 6 =y}) =P(x,y)D[h(x) 6 =y]
Suppose you have an algorithm that given a dataset can output a hypothesishoHd
with minimum error on your dataset. We now consider two settings theconsistent
and theinconsistentsetting. In the consistent setting there exists anh Hdsuch
thatLD(h) = 0and in the inconsistent setting there does not exist anh Hdsuch
thatLD(h) = 0.
What does the PAC learning theory tell us about the amount data required to learn
well in both the consistent and inconsistent setting. In your answer contrast the two
settings as well as explain how the results specialise to the hypothesis classHd.
[10 marks]
COMP0078 6 CONTINUED
  1. Recall the minimum cut method method for semi-supervised learning. Suppose we have two labeled vertices a vertexpwith the value of +1 and a vertexqwill the vertex 1 . Suppose the value of the minimum cut Objective function is 6.
a. Draw a partially labeled (unweighted) graph with only two labeled verticespandq
so that if we then calculated the value of the minimum cut objective function is 6.
[5 marks]
b. Given the setting of the question. What if anything can we say about the effective
resistance between vertexpandq? Explain your reasoning,
[10 marks]
COMP0078 7 TURN OVER
  1. Partial feedback.Consider the following bandit problem withkclasses,
Protocol:
Natureselects` 1 ,` 2 ,...,`T{ 0 , 1 }k.
Fort= 1ToTDo
Predict yt[k]
Receive loss `t,yt{ 0 , 1 }
Design a randomised algorithm with a good upper bound on the expected regret (with
respect to the best class),
E
[T
t=
`t,yt
]
minj[k]
T
t=
`t,j
give your algorithm design and an argument for your upper bound.
[10 marks]
COMP0078 8 END OF PAPER