Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PRML(ニューラルネット編)

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for gucchi gucchi
September 20, 2019

 PRML(ニューラルネット編)

Avatar for gucchi

gucchi

September 20, 2019
Tweet

More Decks by gucchi

Other Decks in Science

Transcript

  1. 0. ࠓճͷηϛφʔʹ͍ͭͯ ࠓճͷηϛφʔͰ͸ɺPRML ͷୈ 5 ষͷχϡʔϥϧωοτΛத৺ʹ͓ ࿩͍ͨ͠͠ͱࢥ͍·͢ɻ ·ͨࠓճͷηϛφʔͰ͸ɺਂ૚ֶशͷຊʹΑ͘ॻ͍ͯ͋Δχϡʔϥϧ ωοτΛϊʔυͱΤοδ (ล)

    Λ༻͍ͨάϥϑͰදݱ͢Δ͜ͱ͔Β࢝Ί ͯɺߦྻԋࢉͷ࿩΍ޡࠩؔ਺ͷ࿩Λ͠ɺٯޡࠩ఻ൖ๏ͷઆ໌ʹҠΔྲྀΕ ͷ࿩͸͠ͳ͍ɻ χϡʔϥϧωοτΛઢܗճؼϞσϧ (PRML 3 ষ) ΍ϩδεςΟοΫճ ؼ (PRML 4 ষ) Λ֦ுͨ͠Ϟσϧͱͯ͠ಋೖ͢Δ࿩Λ͢Δɻ(εϥΠ υ 2 ষ) ಋೖޙɺχϡʔϥϧωοτͷॏΈͷରশੑ (εϥΠυ 3 ষ) ΍ଛࣦؔ਺ ͱਖ਼ଇԽͷ࿩ (εϥΠυ 4 ষ) Λߦ͏ɻ ͦͷͨΊɺઢܗճؼϞσϧ΍ϩδεςΟοΫճؼ͸ط஌ͱ͠·͢ɻ ͳ͓஫ҙ఺ͱͯ͠ɺຊεϥΠυͷࣜ൪߸ͱ PRML ͷࣜ൪߸͸ҟͳΓ· ͢ͷͰɺ͝஫ҙ͍ͩ͘͞ɻ 2 / 38
  2. ໨࣍ 1. ಋೖ 2. χϡʔϥϧωοτϫʔΫؔ਺ (PRML 5.1) 3. ॏΈͷۭؒରশੑ (PRML

    5.1.1) 4. ଛࣦؔ਺ͱਖ਼ଇԽ (PRML 5.2, 1.2.5) 3 / 38
  3. 1. ಋೖ ·ͣɺຊεϥΠυશମΛ௨ͯ͠ɺ܇࿅σʔλͷೖྗϕΫτϧͷू߹Λ {x1 , x2 , · · ·

    , xN } ͱॻ͖ɺೖྗϕΫτϧ xn ͸ D ࣍ݩͷϕΫτϧͱ ͢Δɻ ·ͨɺͦͷೖྗϕΫτϧʹରԠ͢Δ໨ඪϕΫτϧͷू߹Λ {t1 , t2 , · · · , tN } ͱॻ͖ɺtn ͸ K ࣍ݩͷϕΫτϧͱ͢Δɻ (χϡʔϥϧωοτʹݶΒͣ) ڭࢣ͋ΓػցֶशͰͷզʑͷ໨త͸༻ҙ ͨ͠܇࿅σʔλΛ༻͍ͯɺೖྗσʔλ͔Β໨ඪϕΫτϧΛ༧ଌ͢Δؔ਺ y(x) Λ࡞ͬͯɺະ஌ͷσʔλ x ͷ໨ඪϕΫτϧ t Λ y(x) Ͱ༧ଌ͢Δ͜ ͱͰ͋Δɻ 4 / 38
  4. 1. ಋೖ ࣮ͨͩ͠ࡍ͸ɺ܇࿅σʔλΛ࢖ͬͯ༧ଌؔ਺ y(x) ΛҰ͔Β࡞Γ্͛Δ ͜ͱ͸͠ͳ͍ɻ PRML ͷ 3 ষ

    (ઢܗճؼ) Ͱ͸ɺK = 1 ͱͯ͠ɺҎԼͷΑ͏ͳܗΛͨ͠ ؔ਺ y(x, w) y(x, w) = w0 + M−1 ∑ j=1 wj ϕj (x) = wTϕ(x) (1.1) ʹݶఆͯٞ͠࿦Λͨ͠ɻ ͜͜Ͱɺw = (w0 , w1 , · · · , wM−1 )T ͸ύϥϝʔλϕΫτϧͰ͋Δɻ ؔ਺ y(x) ΛҰ͔Β࡞Δ୅ΘΓʹɺ܇࿅σʔλΛ࢖ͬͯύϥϝʔλϕΫ τϧ w Λௐઅ (w = w⋆) ͠ɺ໨ඪม਺ͷ༧ଌؔ਺ y(x) ͱͯ͠ɺ y(x, w = w⋆) Λ࢖༻͢Δɻ 5 / 38
  5. 1. ಋೖ ͪͳΈʹɺಛ௃ϕΫτϧͱݺ͹ΕΔϕΫτϧؔ਺ ϕ(x) ͸ ϕ(x) = (ϕ0 (x), ϕ1

    (x), · · · , ϕM−1 (x))T ͱఆٛ͞Εɺϕ0 (x) = 1ɺͦΕҎ ֎ͷ ϕj (x) (j = 1, · · · , M − 1) ͸Կ͔͠Βͷඇઢܗͳؔ਺ (جఈؔ਺) Ͱ͋Δɻ ྫ͑͹ɺجఈؔ਺ͷྫͱͯ͠Ψ΢εجఈؔ਺͕͋Δɻ ϕj (x) = exp { − (x − µj )2 2s2 } (1.2) ͜ͷجఈؔ਺͸ x = µj Λத৺ʹͯ͠ɺ෼ࢄ s2 ʹΑͬͯࢧ഑͞ΕΔ޿͕ ΓΛ࣋ͭΨ΢εجఈؔ਺Ͱ͋Δɻ 6 / 38
  6. 1. ಋೖ ҰํɺPRML ͷ 4 ষͰٞ࿦ͨ͠ϩδεςΟοΫճؼͰ͸ɺK = 1 ͱ͠ ͯɺҎԼͷΑ͏ͳܗΛͨؔ͠਺

    y(x, w) y(x, w) = σ(wTϕ(x)) (1.3) ʹݶఆͯٞ͠࿦Λͨ͠ɻ ͜͜Ͱɺσ(x) ͸ϩδεςΟοΫγάϞΠυؔ਺ͱݺ͹ΕɺҎԼͰఆٛ ͞ΕΔɻ σ(x) = 1 1 + e−x (1.4) ਤͰॻ͘ͱҎԼͷΑ͏ʹͳΔɻ 7 / 38
  7. 1. ಋೖ ճؼͰ͸ɺ༧ଌؔ਺ y(x) Λͦͷ··໨ඪม਺ͷ༧ଌ݁Ռʹ࢖͑Δ͕ɺ ෼ྨ໰୊Ͱ͋ΔϩδεςΟοΫճؼͰ͸ɺ͋ΔೖྗϕΫτϧ x ͕༩͑ ΒΕͨ࣌ʹ y(x)

    ≥ 0 Ͱ͋Ε͹ x ͸Ϋϥε 1 ʹॴଐ͠ (t = 1)ɺy(x) < 0 Ͱ͋Ε͹ x ͸Ϋϥε 2 ʹॴଐ͢Δ (t = 0) ͱ͢Δɻ ·ͱΊΔͱɺઢܗճؼͰ΋ϩδεςΟοΫճؼͰ΋༧ଌؔ਺ y(x) ΛҎ ԼͷΑ͏ͳಛఆͷܗʹԾఆ͓͍ͯͯ͠ɺ y(x, w) = f(wTϕ(x)) (1.5) ܇࿅σʔλΛ༻͍ͯɺύϥϝʔλ w Λௐઅ͢ΔࣄʹΑΓɺ༧ଌؔ਺ y(x) Λੜ੒ͨ͠ɻ ͜͜Ͱɺؔ਺ f(·) ͸೚ҙͷඇઢܗؔ਺Ͱ͋Δɻ(ઢܗճؼͷ࣌͸߃౳ؔ ਺ɺϩδεςΟοΫճؼͷ࣌͸ϩδεςΟοΫγάϞΠυؔ਺Λ࢖༻ ͨ͠ɻ) ϕΫτϧؔ਺ ϕ(x) Λಛఆͷؔ਺ʹऔΔ͜ͱͰϞσϧ͕χϡʔϥϧωο τϫʔΫϞσϧʹͳΔɻ 8 / 38
  8. 2. χϡʔϥϧωοτϫʔΫؔ਺ ͜Ε·Ͱͷٞ࿦Ͱɺઢܗճؼ΋ϩδεςΟοΫճؼ΋༧ଌؔ਺ y(x, w) ͸ y(x, w) = f(wTϕ(x))

    (2.1) ͷؔ਺ͷܗΛԾఆ͢Δ͜ͱΛઆ໌ͨ͠ɻ ۩ମྫͱͯ͠ɺϕ(x) ͸ ϕ(x) = (ϕ0 (x), ϕ1 (x), · · · , ϕM−1 (x))T Ͱఆٛ͞ Ε͍ͯͯɺϕ0 (x) = 1 ͱ͠ɺͦΕҎ֎ͷ ϕj (x) (j = 1, · · · , M − 1) ͸Ҏ ԼͷΑ͏ʹΨ΢εجఈؔ਺ͱԾఆ͢Δํ๏͕͋Δɻ ϕj (x) = exp { − (x − µj )2 2s2 } (2.2) ͜ͷΨ΢εجఈؔ਺ͷύϥϝʔλ µj (j = 1, · · · , M − 1) ͱ s2 ͸ύϥ ϝʔλ͸ɺ܇࿅σʔλΛ༻͍ͯௐઅ͞ΕΔύϥϝʔλ w ͱ͸ҟͳΓɺ y(x, w) ͷܗΛܾΊΔ࣌ʹखಈͰܾΊΔϋΠύʔύϥϝʔλͰ͋Δɻ (͜Ε͕΋ֶ͠शύϥϝʔλͰ͋ͬͨΒɺ ʮઢܗʯճؼͰ͸ͳ͘ͳΔ) 9 / 38
  9. 2. χϡʔϥϧωοτϫʔΫؔ਺ χϡʔϥϧωοτͰ͸ɺಛ௃ϕΫτϧ ϕ(x) ࣗ਎ֶ͕शύϥϝʔλʹґ ଘ͢ΔΑ͏ʹબͿɻ ύϥϝʔλʹ͍ͭͯ͸ɺΨ΢εجఈؔ਺ͷ࣌ͷ µj (j =

    1, · · · , M − 1) ͱ s2 ͱಉ͡Α͏ʹجఈؔ਺ ϕj (x) (j = 0, · · · , M − 1) ͦΕͧΕʹಠཱ ͳύϥϝʔλ w(1) j Λ༻ҙ͢Δɻ ·ͨɺ͜ΕΒͷύϥϝʔλ w(1) j (ॎϕΫτϧ) Λసஔͯ͠ɺॎʹฒ΂ͨ ҎԼͷΑ͏ͳߦྻ W(1) Λߟ͑Δɻ W(1) = ( w(1) 0 , w(1) 1 , · · · , w(1) M−1 )T (2.3) ಛ௃ϕΫτϧ ϕ(x) ͸ߦྻ W(1) ґଘ͓ͯ͠Γɺϕ(x; W(1)) ͱ͔͘͜ͱ ʹ͢Δɻ 10 / 38
  10. 2. χϡʔϥϧωοτϫʔΫؔ਺ ֶशύϥϝʔλʹґଘͨ͠ϕΫτϧؔ਺ ϕ(x; W(1)) Λ༻͍Δͱɺ༧ଌ ؔ਺ y(x, w) ͸ҎԼͷΑ͏ʹͳΔɻ

    y(x, w) = f ( w(2)T ϕ(x; W(1)) ) (2.4) ͜͜Ͱɺw ͸ύϥϝʔλϕΫτϧ w(2) ͱ W(1) Λ߹Θͤͨશͯͷύϥ ϝʔλΛҙຯ͠ɺͭ·Γ w(2) ͸ w ͷதͰ W(1) Ҏ֎ͷύϥϝʔλͰ ͋Δɻ ͜͜Ͱɺಛ௃ϕΫτϧ ϕ(x; W(1)) Λɺh(x) ΛͳΜΒ͔ͷඇઢܗؔ਺ͱ ͯ͠ɺҎԼͷܗʹݶఆ͢Δɻ ϕ(x; W(1)) =h ( W(1)x ) = ( h ( D ∑ i=0 w(1) 0i xi ) , h ( D ∑ i=0 w(1) 1i xi ) , · · · , h ( D ∑ i=0 w(1) M−1,i xi )) T (2.5) ߦྻ W(1) ͷ (j, i) ੒෼Λ w(1) ji ͱॻ͘͜ͱʹ͢Δɻ 11 / 38
  11. 2. χϡʔϥϧωοτϫʔΫؔ਺ ͜͜ͰɺεΧϥʔͷҾ਺Λ࣋ͭؔ਺ h(x) ʹϕΫτϧͷҾ਺Λ༩͑Δͱɺ ҎԼͷΑ͏ʹҾ਺Λಉ࣍͡ݩͷϕΫτϧΛฦ͢ͱ͢Δɻ h(a) = (h(a1 ),

    h(a2 ), · · · , h(aD ))T (2.6) (2.5) ͷΑ͏ʹϕΫτϧؔ਺ͷܗΛݶఆͨ͠Βɺ༧૝ؔ਺ y(x, w) ͸த ؒ૚ 1 ͭͰग़ྗϢχοτ 1 ͭͰதؒ૚ͱग़ྗ૚ͷ׆ੑԽؔ਺͕ͦΕͧ Ε h ͱ f Ͱ͋ΔχϡʔϥϧωοτϫʔΫؔ਺ͱͳΔ͜ͱ͕Θ͔Δɻ y(x, w) = f ( w(2)T h ( W(1)x )) (2.7) 12 / 38
  12. 2. χϡʔϥϧωοτϫʔΫؔ਺ ͞ΒʹҰൠԽͱͯ͠ɺ༧ଌؔ਺ y(x, w) Λ K ੒෼ͷϕΫτϧ༧ଌؔ਺ y(x, w)

    ʹ֦ு͠ɺy(x, w) ͷ k ੒෼໨Λ yk (x, w) ͱॻ͘ɻ ͜ͷҰൠԽ͸χϡʔϥϧωοτͷग़ྗϢχοτͷ਺Λ 1 ͔ͭΒ K ݸ΁ ͷ֦ுʹରԠ͢Δɻ ͜ͷ࣌ɺॏΈύϥϝʔλ (2.7) ʹؚ·ΕΔ w(2) ͸ϕΫτϧ༧ଌؔ਺ y(x, w) ͷ੒෼͝ͱʹಠཱͨ͠ύϥϝʔλ w(2) k Λ༻ҙ͢ΔͱɺϕΫτ ϧؔ਺ y(x, w) ͷ k ੒෼໨͸ yk (x, w) = f ( w(2) k T h ( W(1)x )) (2.8) ͱͳΔɻ 13 / 38
  13. 2. χϡʔϥϧωοτϫʔΫؔ਺ W(1) ͱಉ͡Α͏ʹɺw(2) k (ॎϕΫτϧ) Λసஔͯ͠ɺॎʹฒ΂ͨҎԼͷ Α͏ͳߦྻ W(2) W(2)

    = ( w(2) 1 , w(2) 2 , · · · , w(1) K )T (2.9) Λߟ͑ΔͱɺϕΫτϧؔ਺ y(x, w) ͸ҎԼͷΑ͏ʹͳΓɺ͜Ε͸தؒ૚ 1 ͭͰग़ྗϢχοτ K ݸͷχϡʔϥϧωοτϫʔΫؔ਺ͱͳΔɻ y(x, w) = f ( W(2)h ( W(1)x )) (2.10) ߦྻ W(1) ͷ (j, i) ੒෼Λ w(1) ji ɺߦྻ W(2) ͷ (k, j) ੒෼Λ w(2) kj ͱ͢Δ ͱɺ༧ଌؔ਺ yk (x, w) ͸ҎԼͷΑ͏ͳ (ݟ׳Εͨ) ܗʹͳΔɻ yk (x, w) = f ( M−1 ∑ j=0 w(2) kj h ( D ∑ i=0 w(1) ji xi )) (2.11) 14 / 38
  14. 3. ॏΈͷۭؒରশੑ λϯδΣϯτϋΠύϘϦοΫͷॏཁͳੑ࣭ͱͯ͠ɺحؔ਺ੑ͕͋Δɻ tanh(−x) = e−x − e−(−x) e−x +

    e−(−x) = − ex − e−x ex + e−x = − tanh(x) (3.3) ·ͨɺߦྻΛ࢖Θͳ͍Ͱॻ͘ͱɺy(x, w) ͷ k ੒෼ yk (x, w) ͸ yk (x, w) = σ ( M−1 ∑ j=0 w(2) kj tanh ( D ∑ i=0 w(1) ji xi )) (3.4) ͱͳΔɻ 16 / 38
  15. 3. ॏΈͷۭؒରশੑ ͜͜Ͱ (3.4) ͷӈลͰɺj = 1 ͷશͯͷ i ʹରͯ͠

    w(1) j(=1)i → −w(1) j(=1)i ͱ͍͏ූ߸൓సͷม׵ΛߦͬͯΈΔɻ ͢Δͱɺ(3.4) ͷӈล͸ yk (x, w) =σ ( M−1 ∑ j=0 w(2) kj tanh ( D ∑ i=0 w(1) ji xi )) =σ ( w(2) k0 tanh ( D ∑ i=0 w(1) 0i xi ) + w(2) k1 tanh ( D ∑ i=0 w(1) 1i xi ) + · · · ) →σ ( w(2) k0 tanh ( D ∑ i=0 w(1) 0i xi ) − w(2) k1 tanh ( D ∑ i=0 w(1) 1i xi ) + · · · ) (3.5) ͱมԽ͢Δɻ Αͬͯɺશͯͷ i ʹରͯ͠ w(1) 1i → −w(1) 1i ͳΔม׵Λߦͬͯ΋ɺಉ࣌ʹ શͯͷ k ʹରͯ͠ w(2) k1 → −w(2) k1 ͱ͍͏มԽΛߦ͑͹ɺؔ਺ yk (x, w) ͸ෆมʹอͨΕΔɻ 17 / 38
  16. 3. ॏΈͷۭؒରশੑ j ͸ j = 0, 1, · ·

    · , M − 1 ͷ M ݸͷ஋ΛͱΔͷͰɺ͋Δ j ʹର͢Δ {(w(1) ji , w(2) kj )}i,k → {(−w(1) ji , −w(2) kj )}i,k ͳΔؔ਺ yk (x, w) Λෆมʹ͢ Δม׵͸ M ݸଘࡏ͢Δɻ ͜ΕΑΓɺֶशʹΑͬͯ࠷దԽ͞ΕͨॏΈ W(1), W(2) ͕ಘΒΕͨ࣌ɺ ೚ҙͷೖྗʹ͓͍ͯ౳Ձͳग़ྗ yk (x, w) Λ༩͑ΔॏΈ͸ɺॏΈ W(1), W(2) ΛؚΊͯ 2M ݸଘࡏ͢Δ͜ͱ͕Θ͔Δɻ 18 / 38
  17. 3. ॏΈͷۭؒରশੑ ·ͨɺ΋͏Ұछྨͷରশੑͱͯ͠ɺؔ਺ yk (x, w) yk (x, w) =

    σ ( M−1 ∑ j=0 w(2) kj tanh ( D ∑ i=0 w(1) ji xi )) (3.6) ͷ͋Δ j = j1 ͷॏΈͷू߹ {(w(1) j1i , w(2) kj1 )}i,k ͱ j = j2 ͷॏΈͷू߹ {(w(1) j2i , w(2) kj2 )}i,k ΛೖΕସ͑ͨͱͯ͠΋ɺ೚ҙͷೖྗ x Ͱग़ྗ yk (x, w) ͸มԽ͠ͳ͍ɻ(ަ׵ରশੑ) ͜Ε͸ɺ(3.6) ͷӈลͷ j ͷ࿨ͷॱংΛม͑Δ͜ͱʹ૬౰͢Δɻ ͭ·ΓɺֶशʹΑͬͯ࠷దԽ͞ΕͨॏΈ W(1), W(2) ͕ಘΒΕͨ࣌ɺ͜ ͷަ׵ෆมੑʹΑΓɺ೚ҙͷೖྗʹ͓͍ͯ౳Ձͳग़ྗ yk (x, w) Λ༩͑ ΔॏΈ͸ɺॏΈ W(1), W(2) ΛؚΊͯ M! ݸଘࡏ͢Δ͜ͱ͕Θ͔Δɻ 19 / 38
  18. 4. ଛࣦؔ਺ͱਖ਼ଇԽ Ұൠతʹதؒ૚ 1 ͭͷχϡʔϥϧωοτϫʔΫͷ k ݸ໨ͷϢχοτͷ ग़ྗ͸ yk (x,

    w) = f ( M−1 ∑ j=0 w(2) kj h ( D ∑ i=0 w(1) ji xi )) (4.1) Ͱ༩͑ΒΕΔ͜ͱ͕Θ͔ͬͨɻ ͜͜Ͱɺؔ਺ h ͱ f ͸׆ੑԽؔ਺ͱݺ͹ΕΔඇઢܗؔ਺Ͱ͋Γɺw(1) ji ͱ w(2) kj ͸֤૚ͷॏΈͰ͋Δɻ ܇࿅σʔλͷೖྗϕΫτϧͷू߹Λ {x1 , x2 , · · · , xN } ͱॻ͖ɺͦͷೖ ྗϕΫτϧʹରԠ͢Δ໨ඪϕΫτϧͷू߹Λ {t1 , t2 , · · · , tN } ͱॻ͘ ͱɺΑ͘ߦΘΕΔύϥϝʔλͷ࠷దԽͷํ๏ͱͯ͠ɺճؼͷ࣌ʹҎԼͷ ೋ৐࿨ޡࠩΛ࠷খʹ͢ΔΑ͏ʹύϥϝʔλΛܾΊΔํ๏͕͋Δɻ E(w) = 1 2 N ∑ n=1 ∥y(xn , w) − tn ∥2 (4.2) ͜͜Ͱɺy(x, w) = (y1 (x, w), y2 (x, w), · · · , yK (x, w))T Ͱ͋Δɻ 21 / 38
  19. 4. ଛࣦؔ਺ͱਖ਼ଇԽ χϡʔϥϧωοτϫʔΫͷग़ྗ yk (x, w) Λ֬཰తʹղऍ͢Δͱɺೋ৐ ࿨ޡࠩͷ࠷খԽ͸࠷໬ਪఆͷ݁ՌͰ͋Δ͜ͱ͕Θ͔Δɻ ͜͜Ͱ͸ɺ؆୯ͷͨΊχϡʔϥϧωοτͷग़ྗϢχοτͷ਺͸ 1

    ͭͰ͋ Δ࣌ͷ͜ͱΛߟ͑Δɻ y(x, w) = f ( M−1 ∑ j=0 w(2) j h ( D ∑ i=0 w(1) ji xi )) (4.3) ·ͣ͸ճؼ໰୊͔Β࢝ΊΔɻͭ·Γɺ໨ඪม਺ {t1 , t2 , · · · , tN } ͸ͦΕ ͧΕ࿈ଓతͳ஋Λ࣋ͭɻ ճؼͰ͸ɺ׆ੑԽؔ਺ f ͱ h ΛͦΕͧΕ߃౳ؔ਺ͱλϯδΣϯτϋΠ ύϘϦοΫؔ਺ͱ͢Δɻ y(x, w) = M−1 ∑ j=0 w(2) j tanh ( D ∑ i=0 w(1) ji xi ) (4.4) 22 / 38
  20. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ·ͣɺԾఆͱͯ͠ɺ܇࿅σʔλͷೖྗ {x1 , x2 , · · ·

    , xN } ͕ͳΜΒ͔ͷํ ๏Ͱੜ੒͞Ε (αϯϓϦϯά๏ͷٞ࿦͸ PRML 11 ষ)ɺͦͷೖྗϕΫτ ϧʹରԠ͢Δ໨ඪม਺ {t1 , t2 , · · · , tN } ͸ҎԼͷฏۉ͕ग़ྗ y(x, w) Ͱ ͋ΔΨ΢ε෼෍ͰͦΕͧΕಠཱʹੜ੒͞ΕΔͱ͢Δɻ p(t|x, w, β) = N(t|y(x, w), β−1) (4.5) ͜͜Ͱɺw, β ֶ͕शʹΑͬͯௐઅ͞ΕΔύϥϝʔλͰ͋Δɻ 23 / 38
  21. 4. ଛࣦؔ਺ͱਖ਼ଇԽ Ψ΢ε෼෍͸ҎԼͰఆٛ͞ΕΔɻ(ύϥϝʔλ͸ฏۉ µ ͱ෼ࢄ σ2 ͷ 2 ͭ) N(x|µ,

    σ2) = 1 (2πσ2)1/2 exp { − 1 2σ2 (x − µ)2 } (4.6) ճؼͷ৔߹͸֬཰ม਺͸࿈ଓม਺ͳͷͰɺ͜ͷΨΠε෼෍ͷԾఆ͸औΓ ͏Δ஋ͷൣғʹؔͯࣗ͠વͰ͋Δɻ(෼ྨ໰୊Ͱ͸ผͷ෼෍ΛԾఆ ͢Δɻ) 24 / 38
  22. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ܇࿅σʔλ͸ (4.5) ͔Βಠཱʹੜ੒͞ΕΔͷͰɺ໬౓ؔ਺͸ҎԼͷΑ͏ ʹͦΕͧΕͷσʔλ఺ͷੵͰ͔͚Δɻ p(t|X, w, β) =

    N ∏ n=1 N(tn |y(xn , w), β−1) (4.7) ͜ͷ໬౓ؔ਺Λ࠷େʹ͢Δ w, β ΛٻΊΔ͜ͱΛߟ͑Δɻ(࠷໬ਪఆ๏) ͦ͜Ͱɺp(t|X, w, β) Λ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔ୅ΘΓʹ ໬౓ؔ਺ͷର਺Λ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔɻ 25 / 38
  23. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ·ͣɺ ln { N(tn |y(xn , w), β−1)

    } = ln [ β1/2 (2π)1/2 exp { − β 2 (tn − y(xn , w))2 }] = 1 2 ln β − 1 2 ln (2π) − β 2 (tn − y(xn , w))2 (4.8) ΑΓɺln p(t|X, w, β) ͸ҎԼͷΑ͏ʹͳΔɻ ln p(t|X, w, β) = N ∑ n=1 ln N(tn |y(xn , w), β−1) = N ∑ n=1 [ 1 2 ln β − 1 2 ln (2π) − β 2 (tn − y(xn , w))2 ] = N 2 ln β − N 2 ln (2π) − β 2 N ∑ n=1 (tn − y(xn , w))2 (4.9) 26 / 38
  24. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ͜͜Ͱɺೋ৐࿨ޡࠩ E(w) Λ E(w) = 1 2 N

    ∑ n=1 (tn − y(xn , w))2 (4.10) ͱఆٛ͢Δͱɺln p(t|X, w, β) ͸ ln p(t|X, w, β) = N 2 ln β − N 2 ln (2π) − E(w) (4.11) ͱͳΔɻ ࠷໬ਪఆղ wML , βML ΛٻΊΔͨΊʹର਺໬౓ ln p(t|X, w, β) ͷޯ഑ ΛٻΊΔɻ ର਺໬౓ͷ w ʹର͢Δޯ഑͸ β ʹґଘ͠ͳ͍ͷͰɺઌʹ wML ΛٻΊ ͯɺͦͷ͋ͱʹ ln p(t|X, wML , β) Λ༻͍ͯ βML ΛٻΊΔ͜ͱ͕Ͱ ͖Δɻ 27 / 38
  25. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ·ͣɺର਺໬౓ (4.11) Λ w ʹؔͯ͠࠷େԽ͢Δ͜ͱΛߟ͑Δͱɺ (4.11) ͷӈลͷ 1,

    2 ߲໨͸ w ʹґଘ͠ͳ͍ͷͰɺ3 ߲໨ͷ −βED (w) Λ࠷େԽ͢Δ͜ͱͱ౳ՁͰ͋Δɻ β > 0 ΑΓɺର਺໬౓ (4.11) Λ w ʹؔͯ͠࠷େԽ͢Δ͜ͱ͸ೋ৐࿨ޡ ࠩ ED (w)(4.10) Λ w ʹؔͯ͠࠷খʹ͢Δ͜ͱͱ౳ՁͰ͋Δɻ ͜ΕΑΓɺೋ৐࿨ޡࠩͷ࠷খԽ͸֬཰࿦Λ༻͍Δͱ໬౓ؔ਺ΛΨ΢ε෼ ෍ͱԾఆͨ͠ͱ͖ͷ࠷໬ਪఆͷ݁ՌͰ͋Δࣄ͕Θ͔Δɻ ࣮ࡍͷ࠷খԽ͸ (͝ଘ஌ͷ௨Γ) ٯޡࠩ఻ൖ๏ͳͲΛ༻͍ͯ൓෮తʹ࣮ ࢪ͢Δɻ 28 / 38
  26. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ࣍ʹ෼ྨ໰୊ΛऔΓѻ͏ɻͭ·Γɺ໨ඪม਺ {t1 , t2 , · · ·

    , tN } ͕཭ࢄత ͳ஋Λ࣋ͪɺ0 ͔ 1 ͷ 2 ஋ΛऔΓ͏Δͱ͢Δɻ ෼ྨ໰୊Ͱ͸ɺ׆ੑԽؔ਺ f ͱ h ΛͦΕͧΕϩδεςΟοΫγάϞΠ υؔ਺ͱλϯδΣϯτϋΠύϘϦοΫؔ਺ͱ͢Δɻ y(x, w) = σ ( M−1 ∑ j=0 w(2) j tanh ( D ∑ i=0 w(1) ji xi )) (4.12) ग़ྗ૚ͷ׆ੑԽؔ਺ΛϩδεςΟοΫγάϞΠυؔ਺ʹ͍ͯ͠ΔͷͰɺ y(x, w) ͸ 0 < y(x, w) < 1 ͷൣғʹ஋ΛͱΔɻ 29 / 38
  27. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ෼ྨ໰୊Ͱ΋Ծఆͱͯ͠ɺ܇࿅σʔλͷೖྗ {x1 , x2 , · · ·

    , xN } ͕ͳΜ Β͔ͷํ๏Ͱੜ੒͞Ε (αϯϓϦϯά๏ͷٞ࿦͸ PRML 11 ষ)ɺͦͷೖ ྗϕΫτϧʹରԠ͢Δ໨ඪม਺ {t1 , t2 , · · · , tN } ͸ҎԼͷϕϧψʔΠ෼ ෍ͰͦΕͧΕಠཱʹੜ੒͞ΕΔͱ͢Δɻ p(t|x, w) = (y(x, w))t(1 − y(x, w))1−t (4.13) ͜͜Ͱɺw ֶ͕शʹΑͬͯௐઅ͞ΕΔύϥϝʔλͰ͋Δɻ t = 1 ͷ֬཰͸ y(x, w) ͱͳΓɺt = 0 ͷ֬཰͸ 1 − y(x, w) ͱͳΔɻ 0 < y(x, w) < 1 ʹ஋ΛͱΔͷͰɺͲͪΒͱ΋֬཰ͷऔΓ͏Δ஋ͷൣғ ͷ৚݅Λຬͨ͢ɻ 30 / 38
  28. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ܇࿅σʔλ͸ (4.5) ͔Βಠཱʹੜ੒͞ΕΔͷͰɺ໬౓ؔ਺͸ҎԼͷΑ͏ ʹͦΕͧΕͷσʔλ఺ͷੵͰ͔͚Δɻ p(t|X, w) = N

    ∏ n=1 (y(xn , w))tn (1 − y(xn , w))1−tn (4.14) ͜ͷ໬౓ؔ਺Λ࠷େʹ͢Δ w ΛٻΊΔ͜ͱΛߟ͑Δɻ p(t|X, w) Λ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔ୅ΘΓʹ໬౓ؔ਺ͷ ର਺Λ࠷େԽ͢ΔΑ͏ͳύϥϝʔλΛٻΊΔɻ 31 / 38
  29. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ln p(t|X, w) ͸ҎԼͷΑ͏ʹͳΔɻ ln p(t|X, w) =

    N ∑ n=1 ln { (y(xn , w))tn (1 − y(xn , w))1−tn } = N ∑ n=1 {tn ln y(xn , w) + (1 − tn ) ln (1 − y(xn , w))} = − E(w) (4.15) ͜͜ͰɺE(w) ͸ަࠩΤϯτϩϐʔޡࠩͰ͋Δɻ E(w) = − N ∑ n=1 {tn ln y(xn , w) + (1 − tn ) ln (1 − y(xn , w))} (4.16) ͜ΕΑΓɺަࠩΤϯτϩϐʔޡࠩͷ࠷খԽ͸ɺ֬཰࿦Λ༻͍Δͱɺ໬౓ ؔ਺ΛϕϧψʔΠ෼෍ͱԾఆͨ͠ͱ͖ͷ࠷໬ਪఆͷ݁ՌͰ͋Δࣄ͕Θ ͔Δɻ 32 / 38
  30. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ࿩Λճؼʹ໭͢ͱɺճؼ໰୊Ͱ͸ҎԼͷೋ৐࿨ޡࠩΛ࠷খԽ͢ΔΑ͏ʹ ύϥϝʔλ w ΛܾΊΔͷͰ͋ͬͨɻ E(w) = 1 2

    N ∑ n=1 (tn − y(xn , w))2 (4.17) Α͘஌ΒΕ͍ͯΔݱ৅ͱͯ͠ɺχϡʔϥϧωοτͷΑ͏ͳෳࡶͳϞσϧ Ͱσʔλ਺͕গͳ͍࣌ɺύϥϝʔλ͕܇࿅σʔλʹ fit ͗͢͠Δͱ͍͏ աֶशͱݺ͹ΕΔݱ৅͕͋Δɻ Ұൠతʹաֶश͕ى͍ͬͯ͜Δͱ͖͸ɺύϥϝʔλͷ੒෼ͷ஋ͷઈର஋ ͕େ͖͘ͳΔ܏޲ʹ͋ΔͨΊɺաֶशΛ๷͙ͨΊʹೋ৐࿨ޡࠩʹҎԼͷ Α͏ͳ߲ΛՃ͑ͨਖ਼ଇԽ͞Εͨೋ৐࿨ޡࠩͰֶशΛߦ͏͜ͱ͕Α͘ ͋Δɻ E(w; λ) = 1 2 N ∑ n=1 (tn − y(xn , w))2 + λ 2 ∥w∥2 (4.18) 33 / 38
  31. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ͜͜Ͱɺλ ͸ਖ਼ͷϋΠύʔύϥϝʔλͰ͋ΓֶशύϥϝʔλͰ͸ͳ͍ɻ λ ͕ਖ਼Ͱ͋ΔͨΊɺਖ਼ଇ߲Λ௥Ճ͢Δ͜ͱͰɺύϥϝʔλͷ੒෼ͷ஋ͷ ઈର஋͕େ͖͘ͳΔ͜ͱΛ๷͙͜ͱ͕Ͱ͖Δɻ(ৄ͘͠͸ PRML 1.1 ࢀর)

    ࠷ޙʹɺ͜ͷਖ਼ଇ߲͕֬཰࿦Λ༻͍ͨ࣌ʹ MAP ਪఆ (࠷େࣄޙ֬཰ਪ ఆ) ͷ݁Ռͱͯ͠ɺਖ਼ଇ߲͕ొ৔͢Δ͜ͱΛݟΔɻ ͦͷͨΊʹɺࣄޙ֬཰ͱϕΠζਪఆΛܰ͘આ໌͢Δɻ(ৄ͘͠͸ PRML 1.2.3 ࢀর) 34 / 38
  32. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ͜Ε·Ͱ (࠷໬ਪఆ) Ͱ͸ɺ໬౓ؔ਺Λ࠷େʹ͢ΔΑ͏ͳύϥϝʔλ w Λ఺ਪఆ͖ͯͨ͠ɻ ϕΠζਪఆͰ͸ɺڭࢣσʔλΛ༻͍ͯύϥϝʔλ w ͷ֬཰෼෍

    (఺Ͱͳ ͘෯Λ΋ͭɺࣄޙ෼෍ͱݺ͹ΕΔ) ΛٻΊΔɻ ͦͷࣄޙ෼෍Λ༻͍ͯɺະ஌ͷσʔλͷೖྗ x ͕༩͑ΒΕͨ࣌ͷग़ྗ t ͷ༧ଌ෼෍ p(t|x, t, X) ΛٻΊΔɻ(PRML 1.68 ࣜࢀর) ࣄޙ෼෍ͷʮࣄޙʯͱ͸܇࿅σʔλ͕؍ଌ͞Εͨঢ়ଶͰͷύϥϝʔλ w ͷ֬཰෼෍ͱ͍͏ҙຯͰ͋ΓɺҎԼͷ৚݅෇͖֬཰Ͱ͋Δɻ p(w|t, X) (4.19) 35 / 38
  33. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ҰํͰɺ֬཰ͷ৐๏ఆཧ (PRML 1.11 ࣜ) Λ༻͍Δͱɺࣄޙ෼෍͸໬౓ ؔ਺ p(t|X, w)

    ͱࣄલ෼෍ p(w) ͷੵʹൺྫ͢Δɻ(ϕΠζͷఆཧ) p(w|t, X) ∝ p(t|X, w)p(w) (4.20) ճؼͷ࣌ͷ໬౓ؔ਺͸ p(t|X, w, β) = N ∏ n=1 N(tn |y(xn , w), β−1) (4.21) Ͱ༩͍͑ͯͨͨΊɺࣄޙ෼෍ΛٻΊΔʹ͸ࣄલ෼෍ p(w) ΛԾఆ͢Δඞ ཁ͕͋Δɻ 36 / 38
  34. 4. ଛࣦؔ਺ͱਖ਼ଇԽ ࠓճ͸ࣄલ෼෍ͱͯ͠ɺฏۉ͕ 0 Ͱڞ෼ࢄ͕ α−1I ͷΨ΢ε෼෍ΛԾఆ ͢Δɻ p(w) =

    N(w|0, α−1I) (4.22) ͜ΕΒͷ݁ՌΑΓɺࣄޙ෼෍ p(w|t, X) ͸ҎԼͷΑ͏ʹͳΔɻ p(w|t, X) ∝ p(t|X, w, β)p(w) ∝ exp ( − β 2 N ∑ n=1 (tn − y(xn , w))2 ) · exp ( − α 2 ∥w∥2 ) = exp ( − β 2 E(w; α/β) ) (4.23) ͜͜ͰɺE(w; λ) ͸ (4.18) Ͱఆٛͨ͠ਖ਼ଇԽ͞Εͨޡࠩؔ਺Ͱ͋Δɻ 37 / 38