Neural Networks | 作业Machine learning – Lecture 2 : Learning in neural networks

Lecture 2 : Learning in neural networks

Neural Networks | 作业Machine learning – 这是利用Neural Networks进行训练的代写, 对Neural Networks的流程进行训练解析, 是比较典型的Neural Networks/Machine learning等代写方向

机器学习代写 代做机器学习 ai代做 machine learning代写 ML代做

COSC 420
LechSzymanski

DepartmentofComputerScience,UniversityofOtago

March 8 , 2022

MathReview: Linearalgebra

  • Dotproductofad-dimvectorandand-dimvector
!
y
1
... y
d
"
$
%
w 1
.
.
.
w
d
&
(=y 1 w 1 +...+ydwd=
)
d
i= 1
y
i
w
i
  • Dotproductofad-dimvectorandandkmatrix
!
y 1 ... yd
"
$
%
w
11
... w
1 k
.
.
.
.
.
.
.
.
.
w
d 1
... w
dk
&
(=
*
)
d
i= 1
y
i
w
i 1
)
d
i= 1
y
i
w
ik
+
  • Transposeofadkmatrix

$
%
w
11
... w
1 k
.
.
.
.
.
.
.
.
.
w
d 1
... w
dk
&
(
T
=
$
%
w
11
... w
d 1
.
.
.
.
.
.
.
.
.
w
1 k
... w
dk
&
(

MathReview: Linearalgebra

  • Dotproductofanndmatrixandadkmatrix

$
%
y
11
... y
1 d
.
.
.
.
.
.
.
.
.
yn 1 ... ynd
&
(
$
%
w
11
... w
1 k
.
.
.
.
.
.
.
.
.
wd 1 ... wdk
&
(=
$
%
)
d
i= 1
y
1 i
w
i 1
)
d
i= 1
y
1 i
w
ik
.
.
.
.
.
.
.
.
.
)
d
i= 1
yniwi 1 ...
)
d
i= 1
yniwik
&
(
  • Productofascalarandak-dimvector

!
w 1 ...wk
"
=
!
w 1 ...wk
"
  • Sumofank-dimvectorandak-dimvector !
v 1 ...vk
"
+
!
b 1 ...bk
"
=
!
v 1 +b 1 ...vk+bk
"
  • Productofascalarandadkmatrix
  • Sumofadkandadkmatrix

MathReview: Linearalgebra

  • Element-wiseproductofad-dimandad-dimvector !
y
1
... y
d
"
!
w
1
... w
d
"
=
!
y
1
w
1
y
1
w
2
y
d
w
d
"
  • Outerproductofad-dimandak-dimvectors

$
%
y
1
.
.
.
y
d
&
(
!
w
1
... w
k
"
=
$
%
y
1
w
1
y
1
w
2
... y
1
w
k
.
.
.
.
.
.
.
.
.
.
.
.
y
d
w
1
y
d
w
2
... y
d
w
k
&
(

MathReview: Multivariatecalculus

  • Derivative
y=f(x):R$R
dy
dx
=
df(x)
dx
=f

(x)=
f(x)
  • Partialderivativey=f(x 1 ,…,x d
):R
d
$R
y
x 1
=
f(x 1 ,...,xd)
x 1
y
x
d
=
f(x 1 ,...,xd)
x
d
  • Gradient
y=f(x
1
,...,x
d
):R
d
$R
f(x 1 ,...,x
d
)=
*
f(x 1 ,...,xd)
x 1
f(x 1 ,...,xd)
xM
+
  • Chainrule
f
,
g(x)
x
=
f
,
g(x)
g(x)
g(x)
x

MathReview: Probabilitytheory

  • ProbabilitydensityfunctionofrandomvariableX . b
a
p(x)dx=Pr[aXb]and
.


p(x)dx= 1
  • JointprobabilitydensityfunctionofrandomvariablesX 1
,…,X
n
.


.


p(x 1 ,...,xn)dx 1 ...dxn= 1
  • IndependenceofnrandomvariablesX 1
,…,X
n
p(x 1 ,...,x 2 )=p(x 1 )p(x 2 )...p(xn)=
/
n
i= 1
p(xi)

MultilayerFeedForward Machine learning 人工智能”> Neural Networks (MLFF)

Iterative Supervised Learning

Require: DataofNinput-outputpairs(x 1 ,y 1 ),...,(xN,yN)andmodel
y
[L]
=f(x,w)
t 0
Setw
0
whilet<Tdo
n 0
whilen<Ndo
Evaluatey
[L]
n f(xn,wt)
Computewbasedondiscrepancybetweeny
n
andy
[L]
n
wt+ 1 wt+w(sothatf(xn,wt)isclosertoyn)
nn+ 1
endwhile
tt+ 1
endwhile

Hebbianlearning

w
ij
:=w
ij
+y
i
y
j
( 1 )

Perceptron learningrule

w
ij
:=w
ij
+y
i
(y
j
y
j
) ( 2 )

Deltarule

w
[l]
ij
:=w
[l]
ij
+y
[l 1 ]
i
!
"
y
[L]
k
v
[l]
i
#
$
%
yky
[L]
k
&
( 3 )
w
[l]
ij
:=w
[l]
ij
+y
[l 1 ]
i
!
"
y
[l]
j
v
[l]
j
#
$
!
"
y
[l+ 1 ]
m
y
[l]
j
#
$...
!
"
y
[L 1 ]
o
y
[L 2 ]
n
#
$
!
"
y
[L]
k
y
[L 1 ]
o
#
$
%
yky
[L]
k
&
( 4 )
w
[l]
ij
:=w
[l]
ij
+y
[l 1 ]
i
!
"
y
[l]
j
v
[l]
j
#
$
!
"
v
[l+ 1 ]
m
v
[l]
j
#
$
!
"
y
[l+ 1 ]
m
v
[l+ 1 ]
m
#
$
...
!
"
v
[L 1 ]
o
y
[L 2 ]
n
#
$
!
"
y
[L 1 ]
o
v
[L 1 ]
o
#
$
!
"
v
[L]
k
y
[L 1 ]
o
#
$
!
"
y
[L]
k
v
[L]
k
#
$
%
yky
[L]
k
&
( 5 )

Activationfunctions

yk
y
i
=
yk
v
k
vk
y
i
( 6 )
y
k
v
k
=
(v
k
)
v
k
( 7 )
Name Function Derivative
Hardlim (v
k
)=
0
0 vk 0
1 v
k
> 0
(vk)
vk
= 0
Sigmoid (v
k
)=
1
1 +e
v
k
(v
k
)
v
k
=(v
k
)
,
1 (v
k
)
Tanh (v
k
)=
e
v
ke
v
k
e
v
k+e
v
k
(v
k
)
vk
= 1 (v
k
)
2
ReLU (v
k
)=
0
0 v
k
0
v
k
v
k
> 0
(v
k
)
vk
=
0
0 v
k
0
1 v
k
> 0
Linear (v
k
)=v
k
(vk)
vk
(v
k
)= 1

MSEloss

e
k
=y
k
y
[L]
k
( 8 )
L
1
E
k
|X,
Y
k
2
=
N
3
n= 1
p(e
k,n
) ( 9 )
p(ek)=
1
2 
exp

(e
k
)
2

( 10 )
lnL
1
E
k
|X,
Y
k
2
=
N
4
n= 1
lnp(e
k,n
) ( 11 )
argmin
w
*
lnL
1
E
k
|X,
Y
k
+ 2
=argmin
w
5
N
4
n= 1
lnp(e
k,n
)
6
( 12 )
=argmin
w
5
N
4
n= 1
e
2
k,n
6
( 13 )
J
k
=
N
4
i= 1
1
y
k,n
y
[L]
k,n
2
2
( 14 )

Losssurface

Gradient descent

J
k
=
N
4
i= 1
1
y
k,n
y
[L]
k,n
2
2
( 14 )
J
k
w
ij
=
N
4
i= 1
y
[L]
k,n
w
ij
1
yk,ny
[L]
k,n
2
( 15 )
w:=w(
w
J
k
)
:=w
*
J
k
w 1
J
k
wij
J
k
wu
+
( 16 )
w
ij
:=w
ij

J
k
wij
( 17 )
y
[L]
k
w
ij
=
7
y
[l]
j
w
ij
87
y
[l+ 1 ]
m
y
[l]
j
8
7
y
[L 1 ]
n
y
[L 2 ]
m
87
y
[l+ 1 ]
k
y
[L 1 ]
n
8
=y
[l 1 ]
i
7
y
[l]
j
v
[l]
j
87
y
[l+ 1 ]
m
y
[l]
j
8
7
y
[L 1 ]
n
y
[L 2 ]
m
87
y
[L]
k
y
[L 1 ]
n
8 (^18 )

Backpropagation (with MSEloss)

y
L
=
L
!

L 1
"
...
2
#

1
(xW
1
+b
1
)W
2
+b
2
$
...
%
W
L
+b
L
&
( 19 )
WL:=WL+yL 1 
'
(yyL)

L
(v
L
)
vL
&
bL:=bL+
'
(yyL)
L(vL)
vL
&
W
L 1
:=W
L 1
+y
L 2

'
(yy
L
)
L(vL)
vL
W
T
L

L 1 (vL 1 )
vL 1
&
bL 1 :=bL 1 +
'
(yyL)

L
(v
L
)
vL
W
T
L


L 1
(v
L 1
)
vL 1
&
.
.
.
Wl:=Wl+yl 1 
'
(yyL)

L
(v
L
)
vL
W
T
L


L 1
(v
L 1
)
vL 1
...W
T
l+ 1


l
(v
l
)
vl
&
bl:=bl+
'
(yyL)

L
(v
L
)
vL
W
T
L


L 1
(v
L 1
)
vL 1
...W
T
l+ 1


l
(v
l
)
vl
&
( 20 )

Backpropagation (with MSEloss)

,whereU
l
beingthenumberofneuronsinlayerl:
vl=
(
v
[l]
1
... v
[l]
Ul
)
  • isthe 1 Ulactivityvectoroflayerl.
y
l
=
(
y
[l]
1
... y
[l]
U
l
)
  • isthe 1 U l outputvectoroflayerl.
l(vl)
vl
=
*
l(v
[l]
1
)
v
[l]
1
...
l(v
[l]
U
l
)
v
[l]
U
l
+
WL=
,

.

w
[l]
11
... w
[l]
1 Ul
.
.
.
.
.

. …

w
[l]
Ul 11
... w
[l]
Ul 1 Ul
/
0
0
0
1
  • istheU l 1 U l matrixofweightsin
layerl
bl=
(
b
[l]
1
... b
[l]
Ul
)
  • isthe 1 Ulvectorofbiasvaluesinlayerl.
vl=yl 1 Wl+bl
y
[l]
i
=l(v
[l]
i
)
y 0 =xisthenetworkinput
yisthedesirednetworkoutput
W
T
l
  • isthetransposeofW l
denotesthedotproduct
denotestheouterproduct
denotestheelement-wise
product

Cross-entropy(CE)loss

P(y
k
= 1 )=y
[L]
k
( 19 )
P(yk= 0 )= 1 y
[L]
k
( 20 )
L
1
y
k
;y
[L]
k
|X,
Y
k
2
=
N
3
n= 1
p
1
y
k,n
;y
[L]
k,n
2
( 21 )
p(y
k
;y
[L]
k
)=
1
y
[L]
k
2
yk
1
1 y
[L]
k
2
( 1 yk)
( 22 )
lnL
1
y
k
;y
[L]
k
|X,
Y
k
2
=
N
4
n= 1
lnp
1
y
k,n
;y
[L]
k,n
2
( 23 )
argmin
w
*
lnL
1
y
k
;y
[L]
k
|X,
Y
k
+ 2
=argmin
w
5
N
4
n= 1
lnp
1
y
k,n
;y
[L]
k,n
2
6
( 24 )
=argmin
w
5
7
N
4
n= 1
y
k
ln
1
y
[L]
k
2
+( 1 y
k
)ln
1
1 y
[L]
k
2
86
( 25 )

Backpropagation (with CEloss)

Justreplace(yyL)

L
(v
L
)
vL
withy( 1 yL)( 1 y)yLinequation 20 ,
everythingelsestaysthesame.

Softmaxcross-entropyloss

Name Function Derivative
Softmax (v
k
)=
e
v
k
2
j
e
v
k
(vk)
vj
=
0
(v
k
)( 1 (v
k
)) j=k
(v
j
)(v
j
) j=k

Classificationvs. regression

Problem Num. MSE CE Outputlayeractivationfunction
type outputs loss loss Hardlim Sigmoid Tanh ReLU Linear Softmax
Regression
Single        
Multi        
Classification
Categorical
Single        
Multi        
Multi-label Multi        
Classificationwith 2 classesisdonewithasingleoutput,formorethan 2 classes
multi-outputsarerequired.

Summary

  • Mathreview:basicconceptsformlinearalgebra,multivariatecalculusand
probabilitytheory
  • Mathematicalexpressionforoutputofafeedforwardneuralnetwork(MLFF)
  • Iterativelearningalgorithm
  • Rulesforupdatingnetworkparameters:Hebbian,perceptron,delta
  • Generalisationofdeltarulefor:differentactivationfunctions,differentloss
functions
  • Derivationofdeltarulefromoptimisationandgradientdescentprinciple
  • Backpropagation-efficientcomputationofdeltarule
  • Meansquarederrorlossforregressionsvs.ross-entropylossforclassification
  • Softmaxactivationforcategoricalclassification

References…

HerearethereferencesIreliedonforthislecture:
  • OnsupervisedHebbianlearningandperceptronlearningrule:Hagan.etal,Neural
NetworkDesign,https://hagan.okstate.edu/NNDesign.pdf, 2014 ,Ch. 7
andCh. 4 respectively.
  • Onbackpropagation:Haykin,NeuralNetworksacomprehensivefoundation, 1998 ,
Ch. 4 Multilayerperceptrons...butmanyotherbookswilldo
  • OnmaximumlikelihoodandderivationofMSE:Bishop,PatternRecognitionand
MachineLearning, 2007 ,Ch. 1.

Studyguide I

  • Reviewyourmaths:
Review:andunderstandthemathsfromSections 2. 1 - 2. 6 fromtheLinearAlgebra
chapterofGoodfellow.etalsDeeplearningbook.
Read:andunderstandthemathsfromSections 3. 1 - 3. 10 oftheProbabilityand
InformationTheorychapterofGoodfellow.etalsDeeplearningbook.
Study:theconceptsandmechanicsofiterativeandsupervisedlearning.
Study:theconceptofweightupdaterule(notmemorisingdifferentrules,just
whataweightupdateruleisallabout)
Study:theconceptofactivationfunction,donotmemorisemathformulas,but
havesomeideaaboutpropertiesofdifferentactivationfunctions:hard
limiting,sigmoid,tanh,ReLU,linear.
Read:Section 4. 3 oftheNumericalComputationchapterofGoodfellow.etals
Deeplearningbook.

Studyguide II

Study:Theconceptoflossfunction
Read:onoptimisationSections 8. 1 - 8. 3 ,and 8. 5 oftheOptimizationchapterof
Goodfellow.etalsDeeplearningbook.
Study:Theconceptoflosssurface,gradientdescent
Read:ongradient-basedlearninginSection 6. 3 oftheDeepFeedforward
NetworkschapterofGoodfellow.etalsDeeplearning book.
Study:theconceptoftrainingneuralnetworkswithbackpropagationhow
doesitwork,whatserrorblame,whatarethenecessaryconditions(intermsof
networkarchitecture)forittowork?
Study:theconceptofsoftmaxactivationfunctionwhatdoesitdo,whyisit
usuallyusedattheoutputofaneuralnetwork?
Study:thedifferencebetweenregressionandclassification
Study:thedifferencebetweenmeansquarederror(MSE)andcrossentropy
(CE)lossfunctions?

Studyguide III

Study:andunderstandthetablefromSlide 17  howtosetupthearchitecture
andlossfunctionappropriatelyfordifferentproblems.
  • Advancedextras:
Read:onsupervisedlearningandmaximumlikelihood(MLE)inferenceinCh
2. 6 and 8. 2. 2 ofHastiesTheElementsofStatisticalLearning.
Exercise:DeriveMSEandCEerrorsfromtheMLEprinciple(followSlides 11 and
15 )