Lecture 2 : Learning in neural networks
Neural Networks | 作业Machine learning – 这是利用Neural Networks进行训练的代写, 对Neural Networks的流程进行训练解析, 是比较典型的Neural Networks/Machine learning等代写方向
COSC 420
LechSzymanski
DepartmentofComputerScience,UniversityofOtago
March 8 , 2022
MathReview: Linearalgebra
- Dotproductofad-dimvectorandand-dimvector
!
y
1
... y
d
"
$
%
w 1
.
.
.
w
d
&
‘
(=y 1 w 1 +...+ydwd=
)
d
i= 1
y
i
w
i
- Dotproductofad-dimvectorandandkmatrix
!
y 1 ... yd
"
$
%
w
11
... w
1 k
.
.
.
.
.
.
.
.
.
w
d 1
... w
dk
&
‘
(=
*
)
d
i= 1
y
i
w
i 1
…
)
d
i= 1
y
i
w
ik
+
- Transposeofadkmatrix
$
%
w
11
... w
1 k
.
.
.
.
.
.
.
.
.
w
d 1
... w
dk
&
‘
(
T
=
$
%
w
11
... w
d 1
.
.
.
.
.
.
.
.
.
w
1 k
... w
dk
&
‘
(
MathReview: Linearalgebra
- Dotproductofanndmatrixandadkmatrix
$
%
y
11
... y
1 d
.
.
.
.
.
.
.
.
.
yn 1 ... ynd
&
‘
(
$
%
w
11
... w
1 k
.
.
.
.
.
.
.
.
.
wd 1 ... wdk
&
‘
(=
$
%
)
d
i= 1
y
1 i
w
i 1
…
)
d
i= 1
y
1 i
w
ik
.
.
.
.
.
.
.
.
.
)
d
i= 1
yniwi 1 ...
)
d
i= 1
yniwik
&
‘
(
- Productofascalarandak-dimvector
!
w 1 ...wk
"
=
!
w 1 ...wk
"
- Sumofank-dimvectorandak-dimvector !
v 1 ...vk
"
+
!
b 1 ...bk
"
=
!
v 1 +b 1 ...vk+bk
"
- Productofascalarandadkmatrix
- Sumofadkandadkmatrix
MathReview: Linearalgebra
- Element-wiseproductofad-dimandad-dimvector !
y
1
... y
d
"
!
w
1
... w
d
"
=
!
y
1
w
1
y
1
w
2
y
d
w
d
"
- Outerproductofad-dimandak-dimvectors
$
%
y
1
.
.
.
y
d
&
‘
(
!
w
1
... w
k
"
=
$
%
y
1
w
1
y
1
w
2
... y
1
w
k
.
.
.
.
.
.
.
.
.
.
.
.
y
d
w
1
y
d
w
2
... y
d
w
k
&
‘
(
MathReview: Multivariatecalculus
- Derivative
y=f(x):R$R
dy
dx
=
df(x)
dx
=f
(x)=
f(x)
- Partialderivativey=f(x 1 ,…,x d
):R
d
$R
y
x 1
=
f(x 1 ,...,xd)
x 1
…
y
x
d
=
f(x 1 ,...,xd)
x
d
- Gradient
y=f(x
1
,...,x
d
):R
d
$R
f(x 1 ,...,x
d
)=
*
f(x 1 ,...,xd)
x 1
…
f(x 1 ,...,xd)
xM
+
- Chainrule
f
,
g(x)
–
x
=
f
,
g(x)
–
g(x)
g(x)
x
MathReview: Probabilitytheory
- ProbabilitydensityfunctionofrandomvariableX . b
a
p(x)dx=Pr[aXb]and
.
p(x)dx= 1
- JointprobabilitydensityfunctionofrandomvariablesX 1
,…,X
n
.
…
.
p(x 1 ,...,xn)dx 1 ...dxn= 1
- IndependenceofnrandomvariablesX 1
,…,X
n
p(x 1 ,...,x 2 )=p(x 1 )p(x 2 )...p(xn)=
/
n
i= 1
p(xi)
MultilayerFeedForward Machine learning 人工智能”> Neural Networks (MLFF)
Iterative Supervised Learning
Require: DataofNinput-outputpairs(x 1 ,y 1 ),...,(xN,yN)andmodel
y
[L]
=f(x,w)
t 0
Setw
0
whilet<Tdo
n 0
whilen<Ndo
Evaluatey
[L]
n f(xn,wt)
Computewbasedondiscrepancybetweeny
n
andy
[L]
n
wt+ 1 wt+w(sothatf(xn,wt)isclosertoyn)
nn+ 1
endwhile
tt+ 1
endwhile
Hebbianlearning
w
ij
:=w
ij
+y
i
y
j
( 1 )
Perceptron learningrule
w
ij
:=w
ij
+y
i
(y
j
y
j
) ( 2 )
Deltarule
w
[l]
ij
:=w
[l]
ij
+y
[l 1 ]
i
!
"
y
[L]
k
v
[l]
i
#
$
%
yky
[L]
k
&
( 3 )
w
[l]
ij
:=w
[l]
ij
+y
[l 1 ]
i
!
"
y
[l]
j
v
[l]
j
#
$
!
"
y
[l+ 1 ]
m
y
[l]
j
#
$...
!
"
y
[L 1 ]
o
y
[L 2 ]
n
#
$
!
"
y
[L]
k
y
[L 1 ]
o
#
$
%
yky
[L]
k
&
( 4 )
w
[l]
ij
:=w
[l]
ij
+y
[l 1 ]
i
!
"
y
[l]
j
v
[l]
j
#
$
!
"
v
[l+ 1 ]
m
v
[l]
j
#
$
!
"
y
[l+ 1 ]
m
v
[l+ 1 ]
m
#
$
...
!
"
v
[L 1 ]
o
y
[L 2 ]
n
#
$
!
"
y
[L 1 ]
o
v
[L 1 ]
o
#
$
!
"
v
[L]
k
y
[L 1 ]
o
#
$
!
"
y
[L]
k
v
[L]
k
#
$
%
yky
[L]
k
&
( 5 )
Activationfunctions
yk
y
i
=
yk
v
k
vk
y
i
( 6 )
y
k
v
k
=
(v
k
)
v
k
( 7 )
Name Function Derivative
Hardlim (v
k
)=
0
0 vk 0
1 v
k
> 0
(vk)
vk
= 0
Sigmoid (v
k
)=
1
1 +e
v
k
(v
k
)
v
k
=(v
k
)
,
1 (v
k
)
–
Tanh (v
k
)=
e
v
ke
v
k
e
v
k+e
v
k
(v
k
)
vk
= 1 (v
k
)
2
ReLU (v
k
)=
0
0 v
k
0
v
k
v
k
> 0
(v
k
)
vk
=
0
0 v
k
0
1 v
k
> 0
Linear (v
k
)=v
k
(vk)
vk
(v
k
)= 1
MSEloss
e
k
=y
k
y
[L]
k
( 8 )
L
1
E
k
|X,
Y
k
2
=
N
3
n= 1
p(e
k,n
) ( 9 )
p(ek)=
1
2
exp
(e
k
)
2
( 10 )
lnL
1
E
k
|X,
Y
k
2
=
N
4
n= 1
lnp(e
k,n
) ( 11 )
argmin
w
*
lnL
1
E
k
|X,
Y
k
+ 2
=argmin
w
5
N
4
n= 1
lnp(e
k,n
)
6
( 12 )
=argmin
w
5
N
4
n= 1
e
2
k,n
6
( 13 )
J
k
=
N
4
i= 1
1
y
k,n
y
[L]
k,n
2
2
( 14 )
Losssurface
Gradient descent
J
k
=
N
4
i= 1
1
y
k,n
y
[L]
k,n
2
2
( 14 )
J
k
w
ij
=
N
4
i= 1
y
[L]
k,n
w
ij
1
yk,ny
[L]
k,n
2
( 15 )
w:=w(
w
J
k
)
:=w
*
J
k
w 1
…
J
k
wij
…
J
k
wu
+
( 16 )
w
ij
:=w
ij
J
k
wij
( 17 )
y
[L]
k
w
ij
=
7
y
[l]
j
w
ij
87
y
[l+ 1 ]
m
y
[l]
j
8
…
7
y
[L 1 ]
n
y
[L 2 ]
m
87
y
[l+ 1 ]
k
y
[L 1 ]
n
8
=y
[l 1 ]
i
7
y
[l]
j
v
[l]
j
87
y
[l+ 1 ]
m
y
[l]
j
8
…
7
y
[L 1 ]
n
y
[L 2 ]
m
87
y
[L]
k
y
[L 1 ]
n
8 (^18 )
Backpropagation (with MSEloss)
y
L
=
L
!
L 1
"
...
2
#
1
(xW
1
+b
1
)W
2
+b
2
$
...
%
W
L
+b
L
&
( 19 )
WL:=WL+yL 1
'
(yyL)
L
(v
L
)
vL
&
bL:=bL+
'
(yyL)
L(vL)
vL
&
W
L 1
:=W
L 1
+y
L 2
'
(yy
L
)
L(vL)
vL
W
T
L
L 1 (vL 1 )
vL 1
&
bL 1 :=bL 1 +
'
(yyL)
L
(v
L
)
vL
W
T
L
L 1
(v
L 1
)
vL 1
&
.
.
.
Wl:=Wl+yl 1
'
(yyL)
L
(v
L
)
vL
W
T
L
L 1
(v
L 1
)
vL 1
...W
T
l+ 1
l
(v
l
)
vl
&
bl:=bl+
'
(yyL)
L
(v
L
)
vL
W
T
L
L 1
(v
L 1
)
vL 1
...W
T
l+ 1
l
(v
l
)
vl
&
( 20 )
Backpropagation (with MSEloss)
,whereU
l
beingthenumberofneuronsinlayerl:
vl=
(
v
[l]
1
... v
[l]
Ul
)
- isthe 1 Ulactivityvectoroflayerl.
y
l
=
(
y
[l]
1
... y
[l]
U
l
)
- isthe 1 U l outputvectoroflayerl.
l(vl)
vl
=
*
l(v
[l]
1
)
v
[l]
1
...
l(v
[l]
U
l
)
v
[l]
U
l
+
WL=
,
.
w
[l]
11
... w
[l]
1 Ul
.
.
.
.
.
. …
w
[l]
Ul 11
... w
[l]
Ul 1 Ul
/
0
0
0
1
- istheU l 1 U l matrixofweightsin
layerl
bl=
(
b
[l]
1
... b
[l]
Ul
)
- isthe 1 Ulvectorofbiasvaluesinlayerl.
vl=yl 1 Wl+bl
y
[l]
i
=l(v
[l]
i
)
y 0 =xisthenetworkinput
yisthedesirednetworkoutput
W
T
l
- isthetransposeofW l
denotesthedotproduct
denotestheouterproduct
denotestheelement-wise
product
Cross-entropy(CE)loss
P(y
k
= 1 )=y
[L]
k
( 19 )
P(yk= 0 )= 1 y
[L]
k
( 20 )
L
1
y
k
;y
[L]
k
|X,
Y
k
2
=
N
3
n= 1
p
1
y
k,n
;y
[L]
k,n
2
( 21 )
p(y
k
;y
[L]
k
)=
1
y
[L]
k
2
yk
1
1 y
[L]
k
2
( 1 yk)
( 22 )
lnL
1
y
k
;y
[L]
k
|X,
Y
k
2
=
N
4
n= 1
lnp
1
y
k,n
;y
[L]
k,n
2
( 23 )
argmin
w
*
lnL
1
y
k
;y
[L]
k
|X,
Y
k
+ 2
=argmin
w
5
N
4
n= 1
lnp
1
y
k,n
;y
[L]
k,n
2
6
( 24 )
=argmin
w
5
7
N
4
n= 1
y
k
ln
1
y
[L]
k
2
+( 1 y
k
)ln
1
1 y
[L]
k
2
86
( 25 )
Backpropagation (with CEloss)
Justreplace(yyL)
L
(v
L
)
vL
withy( 1 yL)( 1 y)yLinequation 20 ,
everythingelsestaysthesame.
Softmaxcross-entropyloss
Name Function Derivative
Softmax (v
k
)=
e
v
k
2
j
e
v
k
(vk)
vj
=
0
(v
k
)( 1 (v
k
)) j=k
(v
j
)(v
j
) j=k
Classificationvs. regression
Problem Num. MSE CE Outputlayeractivationfunction
type outputs loss loss Hardlim Sigmoid Tanh ReLU Linear Softmax
Regression
Single
Multi
Classification
Categorical
Single
Multi
Multi-label Multi
Classificationwith 2 classesisdonewithasingleoutput,formorethan 2 classes
multi-outputsarerequired.
Summary
- Mathreview:basicconceptsformlinearalgebra,multivariatecalculusand
probabilitytheory
- Mathematicalexpressionforoutputofafeedforwardneuralnetwork(MLFF)
- Iterativelearningalgorithm
- Rulesforupdatingnetworkparameters:Hebbian,perceptron,delta
- Generalisationofdeltarulefor:differentactivationfunctions,differentloss
functions
- Derivationofdeltarulefromoptimisationandgradientdescentprinciple
- Backpropagation-efficientcomputationofdeltarule
- Meansquarederrorlossforregressionsvs.ross-entropylossforclassification
- Softmaxactivationforcategoricalclassification
References…
HerearethereferencesIreliedonforthislecture:
- OnsupervisedHebbianlearningandperceptronlearningrule:Hagan.etal,Neural
NetworkDesign,https://hagan.okstate.edu/NNDesign.pdf, 2014 ,Ch. 7
andCh. 4 respectively.
- Onbackpropagation:Haykin,NeuralNetworksacomprehensivefoundation, 1998 ,
Ch. 4 Multilayerperceptrons...butmanyotherbookswilldo
- OnmaximumlikelihoodandderivationofMSE:Bishop,PatternRecognitionand
MachineLearning, 2007 ,Ch. 1.
Studyguide I
- Reviewyourmaths:
Review:andunderstandthemathsfromSections 2. 1 - 2. 6 fromtheLinearAlgebra
chapterofGoodfellow.etalsDeeplearningbook.
Read:andunderstandthemathsfromSections 3. 1 - 3. 10 oftheProbabilityand
InformationTheorychapterofGoodfellow.etalsDeeplearningbook.
Study:theconceptsandmechanicsofiterativeandsupervisedlearning.
Study:theconceptofweightupdaterule(notmemorisingdifferentrules,just
whataweightupdateruleisallabout)
Study:theconceptofactivationfunction,donotmemorisemathformulas,but
havesomeideaaboutpropertiesofdifferentactivationfunctions:hard
limiting,sigmoid,tanh,ReLU,linear.
Read:Section 4. 3 oftheNumericalComputationchapterofGoodfellow.etals
Deeplearningbook.
Studyguide II
Study:Theconceptoflossfunction
Read:onoptimisationSections 8. 1 - 8. 3 ,and 8. 5 oftheOptimizationchapterof
Goodfellow.etalsDeeplearningbook.
Study:Theconceptoflosssurface,gradientdescent
Read:ongradient-basedlearninginSection 6. 3 oftheDeepFeedforward
NetworkschapterofGoodfellow.etalsDeeplearning book.
Study:theconceptoftrainingneuralnetworkswithbackpropagationhow
doesitwork,whatserrorblame,whatarethenecessaryconditions(intermsof
networkarchitecture)forittowork?
Study:theconceptofsoftmaxactivationfunctionwhatdoesitdo,whyisit
usuallyusedattheoutputofaneuralnetwork?
Study:thedifferencebetweenregressionandclassification
Study:thedifferencebetweenmeansquarederror(MSE)andcrossentropy
(CE)lossfunctions?
Studyguide III
Study:andunderstandthetablefromSlide 17 howtosetupthearchitecture
andlossfunctionappropriatelyfordifferentproblems.
- Advancedextras:
Read:onsupervisedlearningandmaximumlikelihood(MLE)inferenceinCh
2. 6 and 8. 2. 2 ofHastiesTheElementsofStatisticalLearning.
Exercise:DeriveMSEandCEerrorsfromtheMLEprinciple(followSlides 11 and
15 )