Neural Networks | 作业Machine learning - Lecture 2 : Learning in neural networks - 学霸代写 - CS代写, 程序代写, CS作业代写, 代码代写, CS编程代写, java代写, python代写, c++/c代写, R代写, 算法作业代写, web代写, CS assignment代写, MATH代写, 统计代写, 金融代写, business代写, economic, accounting代写等

Lecture 2 : Learning in neural networks

Neural Networks | 作业Machine learning – 这是利用Neural Networks进行训练的代写, 对Neural Networks的流程进行训练解析, 是比较典型的Neural Networks/Machine learning等代写方向

COSC 420

LechSzymanski

DepartmentofComputerScience,UniversityofOtago

March 8 , 2022

MathReview: Linearalgebra

Dotproductofad-dimvectorandand-dimvector

!

y
1
... y
d

"

$

%

w 1

.

w
d

&

‘

(=y 1 w 1 +...+ydwd=

)

i= 1

y
i
w
i

Dotproductofad-dimvectorandandkmatrix

!

y 1 ... yd

"

$

%

w
11
... w
1 k

.

w
d 1
... w
dk

&

‘

(=

*

)

i= 1

y
i
w
i 1

…

)

i= 1

y
i
w
ik

+

Transposeofadkmatrix

$

%

w
11
... w
1 k

.

w
d 1
... w
dk

&

‘

(

=

$

%

w
11
... w
d 1

.

w
1 k
... w
dk

&

‘

(

MathReview: Linearalgebra

Dotproductofanndmatrixandadkmatrix

$

%

y
11
... y
1 d

.

yn 1 ... ynd

&

‘

(

$

%

w
11
... w
1 k

.

wd 1 ... wdk

&

‘

(=

$

%

)

i= 1

y
1 i
w
i 1

…

)

i= 1

y
1 i
w
ik

.

)

i= 1

yniwi 1 ...

)

i= 1

yniwik

&

‘

(

Productofascalarandak-dimvector

!

w 1 ...wk

"

=

!

w 1 ...wk

"

Sumofank-dimvectorandak-dimvector !

v 1 ...vk

"

+

!

b 1 ...bk

"

=

!

v 1 +b 1 ...vk+bk

"

Productofascalarandadkmatrix
Sumofadkandadkmatrix

MathReview: Linearalgebra

Element-wiseproductofad-dimandad-dimvector !

y
1
... y
d

"

!

w
1
... w
d

"

=

!

y
1
w
1
y
1
w
2
y
d
w
d

"

Outerproductofad-dimandak-dimvectors

$

%

y
1

.

y
d

&

‘

(

!

w
1
... w
k

"

=

$

%

y
1
w
1
y
1
w
2
... y
1
w
k

.

y
d
w
1
y
d
w
2
... y
d
w
k

&

‘

(

MathReview: Multivariatecalculus

Derivative

y=f(x):R$R

dy

dx

=

df(x)

dx

=f


(x)=

f(x)

Partialderivativey=f(x 1 ,…,x d

):R

d
$R

x 1

=

f(x 1 ,...,xd)

x 1

…

x
d

=

f(x 1 ,...,xd)

x
d

Gradient

y=f(x
1
,...,x
d

):R

d
$R

f(x 1 ,...,x
d

)=

*

f(x 1 ,...,xd)

x 1

…

f(x 1 ,...,xd)

xM

+

Chainrule

,

g(x)

–

=

,

g(x)

–

g(x)

g(x)

MathReview: Probabilitytheory

ProbabilitydensityfunctionofrandomvariableX . b

p(x)dx=Pr[aXb]and

.

p(x)dx= 1

JointprobabilitydensityfunctionofrandomvariablesX 1

,…,X

n
.

…

.

p(x 1 ,...,xn)dx 1 ...dxn= 1

IndependenceofnrandomvariablesX 1

,…,X

p(x 1 ,...,x 2 )=p(x 1 )p(x 2 )...p(xn)=

/

i= 1

p(xi)

MultilayerFeedForward Machine learning 人工智能”> Neural Networks (MLFF)

Iterative Supervised Learning

Require: DataofNinput-outputpairs(x 1 ,y 1 ),...,(xN,yN)andmodel

[L]
=f(x,w)

t 0

Setw
0

whilet<Tdo

n 0

whilen<Ndo

Evaluatey

[L]

n f(xn,wt)

Computewbasedondiscrepancybetweeny
n
andy

[L]

wt+ 1 wt+w(sothatf(xn,wt)isclosertoyn)

nn+ 1

endwhile

tt+ 1

endwhile

Hebbianlearning

w
ij
:=w
ij
+y
i
y
j

( 1 )

Perceptron learningrule

w
ij
:=w
ij
+y
i
(y
j
y
j

) ( 2 )

Deltarule

[l]
ij
:=w

[l]
ij
+y

[l 1 ]
i

[L]
k

[l]
i

%
yky

[L]
k

&
( 3 )

[l]
ij
:=w

[l]
ij
+y

[l 1 ]
i

[l]
j

[l]
j

[l+ 1 ]
m

[l]
j

$...

[L 1 ]
o

[L 2 ]
n

[L]
k

[L 1 ]
o

%
yky

[L]
k

&
( 4 )

[l]
ij
:=w

[l]
ij
+y

[l 1 ]
i

[l]
j

[l]
j

[l+ 1 ]
m

[l]
j

[l+ 1 ]
m

[l+ 1 ]
m

$
...

[L 1 ]
o

[L 2 ]
n

[L 1 ]
o

[L 1 ]
o

[L]
k

[L 1 ]
o

[L]
k

[L]
k

%
yky

[L]
k

( 5 )

Activationfunctions

yk

y
i

=

yk

v
k

vk

y
i

( 6 )

y
k

v
k

=

(v
k

)

v
k

( 7 )

Name Function Derivative

Hardlim (v
k

)=

0

0 vk 0

1 v
k

> 0

(vk)

vk

= 0

Sigmoid (v
k

)=

1 +e

v
k

(v
k
)

v
k

=(v
k

)

,

1 (v
k

)

–

Tanh (v
k

)=

e
v
ke
v
k

e
v
k+e
v
k

(v
k
)

vk

= 1 (v
k

)

ReLU (v
k

)=

0

0 v
k

0

v
k
v
k

> 0

(v
k
)

vk

=

0

0 v
k

0

1 v
k

> 0

Linear (v
k
)=v
k

(vk)

vk

(v
k

)= 1

MSEloss

e
k
=y
k
y

[L]

( 8 )

L

1 E

|X,

Y

2 =

N
3

n= 1

p(e
k,n

) ( 9 )

p(ek)=

1

exp

(e
k
)


( 10 )

lnL

1 E

|X,

Y

2 =

N
4

n= 1

lnp(e
k,n

) ( 11 )

argmin

*

lnL

1 E

|X,

Y

+ 2

=argmin

5

N
4

n= 1

lnp(e
k,n

)

6 ( 12 )

=argmin

5

N
4

n= 1

k,n

6 ( 13 )

J

=

N
4

i= 1

1

y
k,n
y

[L]

k,n

2 ( 14 )

Losssurface

Gradient descent

J

=

N
4

i= 1

1

y
k,n
y

[L]

k,n

2 ( 14 )

J

w
ij

=

N
4

i= 1

[L]

k,n

w
ij

1

yk,ny

[L]

k,n

2 ( 15 )

w:=w(
w

J

)

:=w

*

J
k

w 1

…

J
k

wij

…

J
k

wu

+

( 16 )

w
ij
:=w
ij

J

wij

( 17 )

[L]

w
ij

=

7

[l]

w
ij

87

[l+ 1 ]

[l]

8 …

7

[L 1 ]

[L 2 ]

87

[l+ 1 ]

[L 1 ]

8

=y

[l 1 ]

7

[l]

[l]

87

[l+ 1 ]

[l]

8 …

7

[L 1 ]

[L 2 ]

87

[L]

[L 1 ]

8 (^18 )

Backpropagation (with MSEloss)

y
L
=
L

L 1

...
2

#

1
(xW
1
+b
1
)W
2
+b
2

$
...

W
L
+b
L

( 19 )

WL:=WL+yL 1

(yyL)


L
(v
L
)

vL

bL:=bL+

(yyL)

L(vL)

vL

W
L 1
:=W
L 1
+y
L 2

(yy
L
)

L(vL)

vL

T
L

L 1 (vL 1 )

vL 1

bL 1 :=bL 1 +

(yyL)


L
(v
L
)

vL

T
L


L 1
(v
L 1
)

vL 1

.
.
.

Wl:=Wl+yl 1

(yyL)


L
(v
L
)

vL

T
L


L 1
(v
L 1
)

vL 1

...W

T
l+ 1


l
(v
l
)

vl

bl:=bl+

(yyL)


L
(v
L
)

vL

T
L


L 1
(v
L 1
)

vL 1

...W

T
l+ 1


l
(v
l
)

vl

( 20 )

Backpropagation (with MSEloss)

,whereU
l
beingthenumberofneuronsinlayerl:

vl=

[l]
1
... v

[l]

Ul

isthe 1 Ulactivityvectoroflayerl.

y
l
=

[l]

1
... y

[l]

U
l

isthe 1 U l outputvectoroflayerl.

l(vl)

vl

l(v

[l]
1
)

[l]
1

...

l(v

[l]
U
l

[l]
U
l

WL=

[l]
11
... w

[l]

1 Ul

.
.
.

.
.

. …

[l]

Ul 11

... w

[l]

Ul 1 Ul

istheU l 1 U l matrixofweightsin

layerl

bl=

[l]
1
... b

[l]

Ul

isthe 1 Ulvectorofbiasvaluesinlayerl.

vl=yl 1 Wl+bl

[l]

=l(v

[l]

y 0 =xisthenetworkinput

yisthedesirednetworkoutput

W
T
l

isthetransposeofW l

denotesthedotproduct

denotestheouterproduct

denotestheelement-wise

product

Cross-entropy(CE)loss

P(y
k
= 1 )=y

[L]

( 19 )

P(yk= 0 )= 1 y

[L]

( 20 )

L

1

y
k
;y

[L]

|X,

Y

2 =

N
3

n= 1

1

y
k,n
;y

[L]

k,n

2 ( 21 )

p(y
k
;y

[L]

)=

1

[L]

2

yk

1

1 y

[L]

2

( 1 yk)

( 22 )

lnL

1

y
k
;y

[L]

|X,

Y

2 =

N
4

n= 1

lnp

1

y
k,n
;y

[L]

k,n

2 ( 23 )

argmin

*

lnL

1

y
k
;y

[L]

|X,

Y

+ 2

=argmin

5

N
4

n= 1

lnp

1

y
k,n
;y

[L]

k,n

2

6 ( 24 )

=argmin

5

7

N
4

n= 1

y
k
ln

1

[L]

2

+( 1 y
k
)ln

1

1 y

[L]

2

86 ( 25 )

Backpropagation (with CEloss)

Justreplace(yyL)


L
(v
L
)

vL

withy( 1 yL)( 1 y)yLinequation 20 ,

everythingelsestaysthesame.

Softmaxcross-entropyloss

Name Function Derivative

Softmax (v
k

)=

v
k
2

v
k

(vk)

vj

=

0

(v
k
)( 1 (v
k
)) j=k

(v
j
)(v
j
) j=k

Classificationvs. regression

Problem Num. MSE CE Outputlayeractivationfunction

type outputs loss loss Hardlim Sigmoid Tanh ReLU Linear Softmax

Regression

Single

Multi

Classification

Categorical

Single

Multi

Multi-label Multi

Classificationwith 2 classesisdonewithasingleoutput,formorethan 2 classes

multi-outputsarerequired.

Summary

Mathreview:basicconceptsformlinearalgebra,multivariatecalculusand

probabilitytheory

Mathematicalexpressionforoutputofafeedforwardneuralnetwork(MLFF)
Iterativelearningalgorithm
Rulesforupdatingnetworkparameters:Hebbian,perceptron,delta
Generalisationofdeltarulefor:differentactivationfunctions,differentloss

functions

Derivationofdeltarulefromoptimisationandgradientdescentprinciple
Backpropagation-efficientcomputationofdeltarule
Meansquarederrorlossforregressionsvs.ross-entropylossforclassification
Softmaxactivationforcategoricalclassification

References…

HerearethereferencesIreliedonforthislecture:

OnsupervisedHebbianlearningandperceptronlearningrule:Hagan.etal,Neural

NetworkDesign,https://hagan.okstate.edu/NNDesign.pdf, 2014 ,Ch. 7

andCh. 4 respectively.

Onbackpropagation:Haykin,NeuralNetworksacomprehensivefoundation, 1998 ,

Ch. 4 Multilayerperceptrons...butmanyotherbookswilldo

OnmaximumlikelihoodandderivationofMSE:Bishop,PatternRecognitionand

MachineLearning, 2007 ,Ch. 1.

Studyguide I

Reviewyourmaths:

Review:andunderstandthemathsfromSections 2. 1 - 2. 6 fromtheLinearAlgebra

chapterofGoodfellow.etalsDeeplearningbook.

Read:andunderstandthemathsfromSections 3. 1 - 3. 10 oftheProbabilityand

InformationTheorychapterofGoodfellow.etalsDeeplearningbook.

Study:theconceptsandmechanicsofiterativeandsupervisedlearning.

Study:theconceptofweightupdaterule(notmemorisingdifferentrules,just

whataweightupdateruleisallabout)

Study:theconceptofactivationfunction,donotmemorisemathformulas,but

havesomeideaaboutpropertiesofdifferentactivationfunctions:hard

limiting,sigmoid,tanh,ReLU,linear.

Read:Section 4. 3 oftheNumericalComputationchapterofGoodfellow.etals

Deeplearningbook.

Studyguide II

Study:Theconceptoflossfunction

Read:onoptimisationSections 8. 1 - 8. 3 ,and 8. 5 oftheOptimizationchapterof

Goodfellow.etalsDeeplearningbook.

Study:Theconceptoflosssurface,gradientdescent

Read:ongradient-basedlearninginSection 6. 3 oftheDeepFeedforward

NetworkschapterofGoodfellow.etalsDeeplearning book.

Study:theconceptoftrainingneuralnetworkswithbackpropagationhow

doesitwork,whatserrorblame,whatarethenecessaryconditions(intermsof

networkarchitecture)forittowork?

Study:theconceptofsoftmaxactivationfunctionwhatdoesitdo,whyisit

usuallyusedattheoutputofaneuralnetwork?

Study:thedifferencebetweenregressionandclassification

Study:thedifferencebetweenmeansquarederror(MSE)andcrossentropy

(CE)lossfunctions?

Studyguide III

Study:andunderstandthetablefromSlide 17  howtosetupthearchitecture

andlossfunctionappropriatelyfordifferentproblems.

Advancedextras:

Read:onsupervisedlearningandmaximumlikelihood(MLE)inferenceinCh

2. 6 and 8. 2. 2 ofHastiesTheElementsofStatisticalLearning.

Exercise:DeriveMSEandCEerrorsfromtheMLEprinciple(followSlides 11 and

15 )