Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ベイズ深層学習(2.2~2.4)

catla
January 17, 2020

 ベイズ深層学習(2.2~2.4)

「ベイズ深層学習」(著 須山 敦志)2.2節から2.4節のスライド

RNNの逆伝播の計算は以下でまとめています.
https://drive.google.com/file/d/1QkGb-Ra_E5PRSzq2YOzhax524GtBIRgO/view?usp=sharing

catla

January 17, 2020
Tweet

More Decks by catla

Other Decks in Education

Transcript

  1. ॱ఻೻ܕχϡʔϥϧωοτϫʔΫ ૚ͷॱ఻೻ܕχϡʔϥϧωοτ͸ҎԼͷΑ͏ʹϞσϧԽग़དྷΔɽ yn = W(2)Φ(W(1)xn ) + ϵn yn ∈

    ℝD xn ∈ ℝH0 ϵn ∈ ℝD W(1) ∈ ℝH1 ×H0 W(2) ∈ ℝD×H1 Φ( ⋅ ) ೖྗɿ ϥϕϧɿ ϊΠζɿ ૚໨ͷύϥϝʔλɿ ૚໨ͷύϥϝʔλɿ ඇઢܗؔ਺׆ੑԽؔ਺ɿ z = Φ(W(1)xn ) ͱ͢Δͨ͠ͱ͖ɼ ɹ ΛӅΕϢχοτͱݺͿɽ·ͨɼ ӅΕϢχοτ΍ೖྗʹରͯ͠૯࿨Λऔͬͨ ΋ͷΛ׆ੑͱ͍͏ɽ zn,h1 ∈ ℝ
  2. ॱ఻೻ܕχϡʔϥϧωοτϫʔΫ ීวੑఆཧʢVOJWFSTBMBQQSPYJNBUJPOUIFPSFNʣ ɹ૚ʢӅΕ૚͕ͭʣͷॱ఻೻ܕχϡʔϥϧωοτʹ͓͍ͯɼӅΕϢχοτͷ ਺Λ૿΍͢͜ͱʹΑͬͯɼ೚ҙͷ࿈ଓؔ਺ʹۙࣅͰ͖Δɽ yn = W(L)Φ(W(L−1)⋯Φ(W(1)xn )⋯) + ϵn

    ෳ਺૚Λ΋ͭॱ఻೻ܕχϡʔϥϧωοτϫʔΫ ɹ૚਺͕૚Ҏ্ͱͳΔΑ͏ͳਂ͍ωοτϫʔΫߏ଄Λ΋ͭϞσϧΛҰൠతʹਂ૚ֶशͱ ݺͿɽʢຊॻͰ͸ɼ૚ͷχϡʔϥϧωοτϫʔΫ΋ਂ૚ֶशϞσϧͷͭͱͯ͠ߟ͑Δɽʣ ૚ͷχϡʔϥϧωοτʹ͓͍ͯɼ ͷཁૉΛશͯ ʹݻఆ͢ΔͱɼϞσϧ͸ҰൠઢܗϞσ ϧʢ(-.ʣͱҰக͢Δɽ͜ͷ৔߹ɼϦϯΫؔ਺͸׆ੑԽؔ਺ͷٯؔ਺ʹ֘౰ɽ  ͭ·Γɼଟ૚ߏ଄Λ࣋ͭχϡʔϥϧωοτϫʔΫϞσϧ͸ɼ(-.ʹ͓͚Δඇઢܗͳɹ ɹม׵Λ܁Γฦ͠ద༻ͨ͠ϞσϧͱղऍՄೳɽ W(2) 1 ⟹
  3. ޯ഑߱Լ๏ͱχϡʔτϯɾϥϑιϯ๏ ɹॱ఻೻ܕχϡʔϥϧωοτϫʔΫ͸ɼඇઢܗؔ਺ͷதʹֶशର৅ͷύϥϝʔλ͕͋Δ͜ͱ Ͱղੳతʹ࠷খղ͕ٻ·Βͳ͍ɽ w w w w w w w

    w ໰୊఺ ղܾࡦ ɹܭࢉػΛ࢖༻ͯ͠ɼ਺஋తʹ࠷খ஋ΛٻΊΔ࠷దԽख๏Λಋೖɽ ࠷΋Α͘࢖ΘΕΔ࠷దԽख๏͕ɹޯ഑߱Լ๏ɹ  ࣍ݩͷύϥϝʔλ Λ΋ͭϞσϧʹର͢Δޡࠩؔ਺Λ ͱ͠ɼޯ഑ΛҎԼͱ͢Δɽ M w E(w) ∇w E(w) = ∂E(w) ∂w = ( ∂E(w) ∂w1 , ∂E(w) ∂w2 , …, ∂E(w) ∂wM ) T ͜Ε͸ɼޡࠩؔ਺͕ϢʔΫϦουڑ཭ͷۙ๣Ͱ࠷΋ٸʹ૿Ճ͢Δํ޲ੑΛද͍ͯ͠Δɽ ޯ഑߱Լ๏Ͱ͸ɼύϥϝʔλ ʹରͯ͠ద౰ͳॳظ஋Λ༩͑ɼޯ഑ͱٯํ޲ʹύϥϝʔλΛ ಈ͔͢͜ͱΛ܁Γฦͯ͠࠷దԽΛߦ͏ɽ w wnew = wold − α∇w E(w)| w=wold (α > 0)
  4. ޯ഑߱Լ๏ͱχϡʔτϯɾϥϑιϯ๏ ύϥϝʔλ਺ ͕ଟ͘ͳ͍৔߹ɼޡࠩؔ਺ͷ֊ඍ෼Λར༻ͯ͠࠷దԽΛޮ཰Խ͢Δ͜ͱ΋ Մೳɽ ɹχϡʔτϯɾϥϑιϯ๏ʢ/FXUPO3BQITPONFUIPEʣ͕୅දྫɽ M → ࠷খԽ͍ͨ͠ޡࠩؔ਺Λ͋Δ ·ΘΓͷςʔϥʔల։ʹΑΓೋ࣍ۙࣅ͢Δͱɼ ¯

    w E(w) ≈ ˜ E(w) = E( ¯ w) + ∇w E(w)|T w= ¯ w (w − ¯ w) + 1 2 (w − ¯ w)T ∇2 w E(w)| w= ¯ w (w − ¯ w)  ͸ޡࠩؔ਺ ʹର͢Δϔοηߦྻʢ)FTTJBONBUSJYʣͰ͋Δɽ͢ͳΘͪɼରশߦྻɽ ∇2 w E E H = ∇2 w E(w) = ∂2E(w) ∂w2 1 ⋯ ∂2E(w) ∂w1 ∂wM ⋮ ⋱ ⋮ ∂2E(w) ∂wM ∂w1 ⋯ ∂2E(w) ∂w1 ∂w2 M
  5. ޯ഑߱Լ๏ͱχϡʔτϯɾϥϑιϯ๏ E(w) ≈ ˜ E(w) = E( ¯ w) +

    ∇w E(w)|T w= ¯ w (w − ¯ w) + 1 2 (w − ¯ w)T ∇2 w E(w)| w= ¯ w (w − ¯ w) ˜ E(w) = E( ¯ w) + ∇w E(w)|T w= ¯ w (w − ¯ w) + 1 2 (w − ¯ w)T ∇2 w E(w)| w= ¯ w (w − ¯ w)  ɼ ͱ͠ɼ ͷޯ഑ Λܭࢉ͢Δͱɼ A = ∇w E(w)| w= ¯ w B = ∇2 w E(w)| w= ¯ w ˜ E(w) ∇ ˜ E(w) = ∂ ˜ E(w) ∂w ˜ E(w) = E( ¯ w) + AT(w − ¯ w) + 1 2 (w − ¯ w)TB(w − ¯ w) ∂ ˜ E(w) ∂w = ∂E( ¯ w) ∂w + ∂AT(w − ¯ w) ∂(w − ¯ w) ⋅ ∂(w − ¯ w) ∂w + 1 2 ⋅ ∂(w − ¯ w)TB(w − ¯ w) ∂(w − ¯ w) ⋅ ∂(w − ¯ w) ∂w  ͕ରশߦྻͷͱ͖ɼ A ∂xTAx ∂x = 2Ax = 0 + (AT)T + 1 2 ⋅ 2B(w − ¯ w) = A + B(w − ¯ w) ͜ΕΛ ͱ͓͍ͯ ʹؔͯ͠ղ͚͹ɼ ∇ ˜ E(w) = 0 w ∇ ˜ E(w) = A + B(w − ¯ w) = A + Bw − B ¯ w = 0 ⇔ Bw = B ¯ w − A w = B−1B ¯ w − B−1A = ¯ w − B−1A  ͸ਖ਼ଇߦྻͱԾఆ B w = ¯ w − {∇2 w E(w)| w= ¯ w } −1 ∇w E(w)| w= ¯ w ͕ٻ·Δɽ
  6. ޯ഑߱Լ๏ͱχϡʔτϯɾϥϑιϯ๏  ɹʹରͯ͠ɼ ͱ͠ɼ࣍ͷΑ͏ʹ܁Γฦ͠ߋ৽ ͢Δ͜ͱͰ Λ࠷খԽͰ͖Δɽ w = ¯ w

    − {∇2 w E(w)| w= ¯ w } −1 ∇w E(w)| w= ¯ w wold = ¯ w E(w) wnew = wold − {∇2 w E(w)| w=wold } −1 ∇w E(w)| w=wold ޯ഑߱Լ๏ʹΑΔ࠷దԽͱൺ΂Δͱɼ ޯ഑߱Լ๏ɹɹɹɹɹɹɹɿɹ  χϡʔτϯɾϥϑιϯ๏ɹɿɹ wnew = wold −α∇w E(w)| w=wold wnew = wold −{∇2 w E(w)| w=wold } −1 ∇w E(w)| w=wold ޯ഑߱Լ๏ʹ͓͚Δֶश཰ ͕ϔοηߦྻͷٯߦྻ ʹରԠ͍ͯ͠Δɽ α {∇2 w E(w)| w=wold } −1
  7. ޡࠩٯ఻೻๏ ɹޡࠩٯ఻೻๏ʢFSSPSCBDLQSPQBHBUJPONFUIPEʣͱ͸ɼॱ఻೻ܕχϡʔϥϧωοτʹ ͓͍ͯ୅ද͞ΕΔֶश๏ͷͭɽ ɹ ૚ͷॱ఻೻ܕχϡʔϥϧωοτϫʔΫΛҎԼͷΑ͏ʹදݱ͢Δɽ L yn = W(L)ϕ(W(L−1)⋯ϕ(W(1)xn )⋯)

    + ϵn yn,d = HL−1 ∑ hL−1 =1 w(L) d,hL−1 ϕ HL−2 ∑ hL−2 =1 w(L−1) hL−1 ,hL−2 ⋯ϕ H0 ∑ h0 =1 w(1) h1 ,h0 xn,h0 + ϵn,d yn,d = a(L) n,d + ϵn,d a(L) n,d = HL−1 ∑ hL−1 =1 w(L) d,hL−1 z(L−1) n,hL−1 z(L−1) n,hL−1 = ϕ(a(L−1) n,hL−1 ) a(l) n,hl = Hl−1 ∑ hl−1 =1 w(l) hl ,hl−1 z(l−1) n,hl−1 z(l−1) n,hl−1 = ϕ(a(l−1) n,hl−1 ) a(1) n,h1 = H0 ∑ h0 =1 w(1) h1 ,h0 z(0) n,h0 z(0) n,h0 = xn,h0 ⋮ ⋮
  8. ૚ ॱ఻೻ ٯ఻೻ ޡࠩٯ఻೻๏ ɹઌʹ͋͛ͨχϡʔϥϧωοτͷύϥϝʔλू߹Λ ɼֶशσʔλ਺Λ ͱͨ͠৔߹ͷ ޡࠩؔ਺Λ ͷΑ͏ʹఆٛ͢Δɽ ʹؔͯ͠ඍ෼͢Δɽ

    W N E(W) = N ∑ n=1 En (W) = N ∑ n=1 ( 1 2 D ∑ d=1 (yn,d − a(L) n,d )2 ) En (W) a(L) n,d = HL−1 ∑ hL−1 =1 w(L) d,hL−1 z(L−1) n,hL−1 = HL−1 ∑ hL−1 =1 w(L) d,hL−1 ϕ(a(L−1) n,hL−1 ) z(L−1) n,hL−1 = ϕ(a(L−1) n,hL−1 ) ∂En ∂w(L) d,hL−1 = ∂En ∂a(L) n,d ∂a(L) n,d ∂w(L) d,hL−1 En (W) = 1 2 D ∑ d=1 (yn,d − a(L) n,d )2 a(L−1) n,hL−1 = HL−2 ∑ hL−2 =1 w(L−1) hL−1 ,hL−2 z(L−2) n,hL−2 = (an,d − yn,d )z(L−1) n,hL−1 = δ(L) n,d z(L−1) n,hL−1 L L − 1 ∂En ∂w(L−1) hL−1,hL−2 = D ∑ d=1 ∂En ∂a(L) n,d ∂a(L) n,d ∂a(L−1) n,hL−1 ∂a(L−1) n,hL−1 ∂w(L−1) hL−1,hL−2 ∂a(L) n,d ∂a(L−1) n,hL−1 = ∂ ∂a(L−1) n,hL−1 ( HL−1 ∑ h=1 w(L) d,h ϕ(a(L−1) n,h ) ) = w(L) d,hL−1 ϕ′(a(L−1) n,hL−1 ) ∂a(L−1) n,hL−1 ∂w(L−1) hL−1,hL−2 = ∂ ∂w(L−1) hL−1,hL−2 HL−2 ∑ h=1 w(L−1) hL−1,h z(L−2) n,h = z(L−2) n,hL−2 = D ∑ d=1 δ(L) n,d (w(L) d,hL−1 ϕ′(a(L−1) n,hL−1 ))z(L−2) n,hL−2 = ϕ′(a(L−1) n,hL−1 ) ( D ∑ d=1 δ(L) n,d w(L) d,hL−1) z(L−2) n,hL−2 = δ(L−1) n,hL−1 z(L−2) n,hL−2 a(l) n,hl = Hl−1 ∑ hl−1 =1 w(l) hl ,hl−1 z(l−1) n,hl−1 z(l) n,hl = ϕ(a(l) n,hl ) l δ(l) n,hl = a(L) n,hl − yn,hl , if l = L ϕ′(a(l) n,hl )∑Hl+1 h=1 δ(l+1) n,h w(l+1) h,hl if l ≠ L ∂En ∂w(l) hl,hl−1 = δ(l) n,hl z(l−1) n,hl−1  ͸ ͷಋؔ਺ɽ ϕ′ ϕ
  9. ޡࠩٯ఻೻๏ ޡࠩٯ఻೻๏ͷΞϧΰϦζϜ ॱ఻೻ɿ E(W) = f(W(L)ϕ(W(L−1)⋯ϕ(W(1)xn )⋯), y) ٯ఻೻ɿ δ(l)

    n,hl = a(L) n,hl − yn,hl , if l = L ϕ′(a(l) n,hl )∑Hl+1 h=1 δ(l+1) n,h w(l+1) h,hl if l ≠ L ޯ഑ܭࢉɿ ∂En ∂w(l) hl ,hl−1 = δ(l) n,hl z(l−1) n,hl−1 ύϥϝʔλͷߋ৽ɿ wnew = wold − α∇w E(w)| w=wold
  10. ೋ஋෼ྨ ଟ஋෼ྨ Ϋϥε਺  % ग़ྗͷ࣍ݩ਺  % ग़ྗ ग़ྗʹର͢Δ

    ׆ੑԽؔ਺ ޡࠩؔ਺ ෼ྨϞσϧͷֶश ɹࠓ·Ͱͷઆ໌ʹ͓͚Δॱ఻೻ܕχϡʔϥϧωοτϫʔΫ͸ճؼ໰୊ʹద༻͞ΕΔɽ Ͱ͸ɼ\ࣝผ ෼ྨ^໰୊ʹରͯ͠͸ɾɾɾʁ a(L) n ∈ ℝ a(L) n ∈ ℝD E(W) = − N ∑ n=1 {yn log μ + (1 − yn )log(1 − μn )} γάϞΠυؔ਺ɿɹ μn = Sig(a(L) n ) ∈ (0,1) ιϑτϚοΫεؔ਺ɿɹ   πd (a(L) n ) D ∑ d=1 πd (a(L) n ) = 1 E(W) = − N ∑ n=1 D ∑ d=1 yn,d log πd (a(L) n ) ɹ͜ΕΒͷޡࠩؔ਺͸ɹަࠩΤϯτϩϐʔޡࠩؔ਺ɹͱݺ͹ΕΔɽ
  11. ֬཰తޯ഑߱Լ๏ ɹઌ΄Ͳͷޯ഑߱Լ๏ͷΑ͏ͳɼ͢΂ͯͷֶशσʔλΛҰ౓ʹ࢖༻ͯ͠ޯ഑Λܭࢉ͢Δํ๏ ͸ɹόονֶशɹͱݺ͹ΕΔɽશσʔλΛ࢖͏ͷͰܭࢉޮ཰͕ѱ͍ɽ  ɹαϯϓϧબ୒ʹΑͬͯޮ཰Խ ɹֶशσʔλͷೖग़ྗσʔλͷ૊ͱͦͷ૊਺ΛͦΕͧΕ ɼ ͱ͢Δɽ ɹαϯϓϧબ୒Ͱ͸ɼֶशσʔλ͔ΒϥϯμϜʹ ૊Λબ୒͢ΔɽαϯϓϧʹΑͬͯ

    બ͹Εͨ૊ͷΠϯσοΫεू߹Λ ͱ͢Δͱɼબ͹Εͨ෦෼ू߹͸ɹ ɹ ͱදͤΔɽ ɹޡࠩؔ਺Λɹ ɹͱͯ͠ύϥϝʔλΛߋ৽͢Δɽ ɹ্هͷΑ͏ͳํ๏Λɹ֬཰తޯ഑߱Լ๏ɹͱݺͿɽ·ͨɼ෦෼ू߹ Λɹϛχόονɹ ͱݺͿɽɹҰ༷෼෍ͰϥϯμϜબ୒͞ΕΔͱظ଴஋͸όονֶश࣌ͱ౳Ձɽ ⟹ N M( < N) = {xn , yn }n∈ E (W) = N M ∑ n∈ En (W)
  12. ֬཰తޯ഑߱Լ๏ ɹճ໨ͷύϥϝʔλߋ৽ʹ༻͍ΒΕΔֶश཰Λ ͱͨ͠ͱ͖ɼ   ͷͱ͖ɼ֬཰Ͱϛχόονֶश͕ऩଋ͢Δɽ ɹ͜ΕΛ༻͍ͯɼϛχόονͷֶशʹΑΓσʔλશମʹ͓͚Δଛࣦؔ਺͕࠷খͱͳΔύ ϥϝʔλΛ୳ࡧ͢Δํ๏ΛɹϩϏϯεɾϞϯϩʔΞϧΰϦζϜɹͱ͍͏ɽ i αi

    ∞ ∑ i=1 αi = ∞, ∞ ∑ i=1 α2 i < ∞ ɹ֬཰తޯ഑߱Լ๏Λޮ཰Խ͢ΔͨΊʹɼύϥϝʔλͷߋ৽ʹ଎౓ϕΫτϧΛಋೖɽ  ɹϞʔϝϯλϜ๏ʢNPNFOUVNNFUIPEʣ      ͸աڈͷޯ഑ͷӨڹΛௐ੔͢Δύϥϝʔλɽ ⟹ pnew = βpold − α∇w E(w)| w=wold wnew = wold + pnew β ∈ [0,1)
  13. ৞ΈࠐΈχϡʔϥϧωοτϫʔΫ ɹ৞ΈࠐΈχϡʔϥϧωοτϫʔΫʢ$//ʣ͸৞ΈࠐΈʢDPOWPMVUJPOʣΛऔΓೖΕͨ Ϟσϧɽ࣌ܥྻ΍ը૾ʹରͯ͠༗ޮɽ ɹ৞ΈࠐΈͷܭࢉ͸ɼը૾Λೖྗʹ૝ఆͯ͠ ͱ͓͖ ॏΈύϥϝʔλʢϑΟϧλʔʣ Λ ͱͨ͠ͱ͖ɼ৞ΈࠐΈޙͷը૾ʢಛ௃Ϛοϓʣ ͷ ൪໨ͷཁૉ͸ɼ

    ɹɹɹɹɹɹɹɹ  Ͱද͞ΕΔɽ ॱ఻೻ܕχϡʔϥϧωοτϫʔΫ͕શ݁߹ʹରͯ͠ɼ$//͸ૄ݁߹Ͱ͋ΔͱݴΘΕ͍ͯΔɽ యܕతͳ$//Ͱ͸ɼ৞ΈࠐΈޙϓʔϦϯάʢFH࠷େϓʔϦϯάʣͱݺ͹ΕΔඇઢܗؔ਺ ΛڬΉɽ X ∈ ℝH×W W ∈ ℝM×N S i, j Si,j = (W * X)i,j = ∑ n∈N,m∈M Wm,n Xi+m−1,j+n−1 ʢQͷਤΛࢀরʣ
  14. ࠶ؼܕχϡʔϥϧωοτϫʔΫ ɹ࠶ؼܕχϡʔϥϧωοτϫʔΫʢ3//ʣ͸ɼσʔλͷܥྻ৘ใΛදݱ͞Εͨχϡʔϥϧ ωοτϫʔΫɽ ɹ࣌ࠁ ʹ͓͚ΔӅΕϢχοτΛ ɼೖྗσʔλΛ ͱͨ͠ͱ͖ɼӅΕϢχοτ͸ ɹɹɹɹ  Ͱද͞ΕΔɽ·ͨɼύϥϝʔλ

    ͸ϞσϧશମͰڞ༗ɽ ͸ཁૉ͝ͱͷඇઢܗؔ਺ɽ ࣌ࠁ ʹ͓͚Δग़ྗ ͸ιϑτϚοΫεؔ਺Λ ͱ͢Δͱɼ   Ͱܭࢉ͞ΕΔɽΑͬͯɼϞσϧͷύϥϝʔλू߹Λ ͱ͢Δͱɼ࣌ܥྻશମͷޡࠩ͸ɼ [  Ͱܭࢉ͞Εɼ͜ΕΛ࠷খԽ͢ΔΑ͏ʹ࠷దԽΛߦ͏ɽ n zn xn zn = ϕ. (Wzx xn + Wzz zn−1 + bz ) Wzx , Wzz , bz ϕ. n πn π πn = π(Wyz zn + by ) Θ E(Θ) = N ∑ n=1 En (Θ) = N ∑ n=1 ( − D ∑ d=1 yn,d log πn,d) QͷਤΛࢀর