Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PRMLセミナー(第9章)

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for gucchi gucchi
May 25, 2019

 PRMLセミナー(第9章)

Avatar for gucchi

gucchi

May 25, 2019
Tweet

More Decks by gucchi

Other Decks in Science

Transcript

  1. 0. ࠓճͷηϛφʔʹ͍ͭͯ ࠓճͷηϛφʔͰ͸ɺPRML ͷୈ 9 ষͷ EM ΞϧΰϦζϜΛ͓࿩͠͠ ͍ͨͱࢥ͍·͢ɻ ·ͨɺ͜ΕΒͷ࿩୊Λઆ໌͢ΔͨΊʹඞཁͳ༧උ஌ࣝΛղઆ͠·͢ɻ

    (PRML 1.2 ֬཰࿦ͱ PRML 2.3 Ψ΢ε෼෍) ͳ͓஫ҙ఺ͱͯ͠ɺຊεϥΠυͷࣜ൪߸ͱ PRML ͷࣜ൪߸͸ҟͳΓ· ͢ͷͰɺ͝஫ҙ͍ͩ͘͞ɻ 2 / 67
  2. ໨࣍ 1. ༧උ஌ࣝ 1-1. ֬཰࿦ (PRML 1.2) 1-2. Ψ΢ε෼෍ (PRML

    2.3) 1-3. ࠞ߹Ψ΢ε෼෍ (PRML 2.3.9) 2. ࠞ߹Ϟσϧͱ EM 2-1. K-means ΫϥελϦϯά (PRML 9.1) 2-2. ࠞ߹Ψ΢ε෼෍ (PRML 9.2) 2-3. EM ΞϧΰϦζϜͷ΋͏Ұͭͷղऍ (PRML 9.3) 2-4. Ұൠͷ EM ΞϧΰϦζϜ (PRML 9.4) 3 / 67
  3. 1-1. ֬཰࿦ ύλʔϯೝࣝʹ͓͍ͯɺॏཁͳෆ࣮֬ੑΛఆྔతʹධՁ͢ΔͨΊʹ֬཰ ࿦Λಋೖ͢Δɻ ֬཰ม਺ X, Y Λߟ͑ɺ͜ΕΒ͸ X =

    xi (i = 1, 2, · · · , M)ɺ Y = yj (j = 1, 2, · · · , L) ΛͱΔͱ͠ɺX = xi , Y = yj ͱͳΔ֬཰ (ಉ࣌ ֬཰) Λ p(X = xi , Y = yj ) ͱ͔͘ɻ X = xi ͱͳΔ֬཰ p(X = xi ) ͸ɺp(X = xi , Y = yj ) Λ༻͍ͯҎԼͷ Α͏ʹ͔͚Δɻ(Ճ๏ఆཧ) p(X = xi ) = L ∑ j=1 p(X = xi , Y = yj ) (1.1) ·ͨɺX = xi ͕༩͑ΒΕ্ͨͰɺY = yj ͱͳΔ֬཰ (৚݅෇͖֬཰) Λ p(Y = yj |X = xi ) ͱ͢ΔͱɺҎԼͷΑ͏ͳؔ܎͕ࣜ੒ཱ͢Δɻ(৐๏ ఆཧ) p(X = xi , Y = yj ) = p(Y = yj |X = xi )p(X = xi ) (1.2) 4 / 67
  4. 1-1. ֬཰࿦ ৐๏ఆཧͱಉ࣌֬཰ͷରশੑ p(X, Y ) = p(Y, X) Λ༻͍ΔͱɺϕΠζͷ

    ఆཧ͕ಋ͚Δɻ p(Y |X) = p(X|Y )p(Y ) p(X) (1.3) ͞Βʹɺಉ࣌෼෍ p(X, Y ) ͕ҎԼͷΑ͏ʹपล෼෍ͷੵͰදͤΔ࣌ɺX ͱ Y ͸ಠཱͰ͋Δͱ͍͏ɻ p(X, Y ) = p(X) p(Y ) (1.4) 5 / 67
  5. 1-1. ֬཰࿦ ͜Ε·Ͱ͸཭ࢄతͳ֬཰ม਺ʹ͍ͭͯߟ͖͑ͯͨɻ࣍ʹ࿈ଓతͳ֬཰ ม਺ͷ෼෍ʹ͍ͭͯߟ͑Δɻ ֬཰ม਺ x ͕ (x, x +

    δx) ͷൣғʹೖΔ֬཰͕ δx → 0 ͷ࣌ʹ p(x) δx ͱ ༩͑ΒΕΔ࣌ɺp(x) Λ֬཰ີ౓ͱ͍͏ɻ ͜ͷ࣌ɺม਺ x ͕۠ؒ (a, b) ʹ͋Δ֬཰͸ҎԼͷࣜͰ༩͑ΒΕΔɻ p(x ∈ (a, b)) = ∫ b a p(x) dx (1.5) ·ͨɺ֬཰ͷඇෛੑͱن֨ԽΑΓɺp(x) ͸ҎԼͷੑ࣭Λ࣋ͭɻ p(x) ≥ 0 (1.6) ∫ ∞ −∞ p(x) dx = 1 (1.7) 6 / 67
  6. 1-2. Ψ΢ε෼෍ Ψ΢ε෼෍͸֬཰ີ౓ؔ਺ͳͷͰɺҎԼͷن֨Խ৚݅Λຬͨ͢ɻ ∫ N(x|µ, σ2) dx = 1 (1.9)

    ·ͨɺΨ΢ε෼෍ͷॏཁͳੑ࣭ͱͯ͠ɺx ͷฏۉ஋Λ෼ࢄ͕ͦΕͧΕ µ ͱ σ2 Ͱ༩͑ΒΕΔ͜ͱͰ͋Δɻ E[x] = ∫ ∞ −∞ N(x|µ, σ2)x dx = µ (1.10) var[x] = E[x2] − E[x]2 = σ2 (1.11) 8 / 67
  7. 1-2. Ψ΢ε෼෍ ࣍ʹɺҎԼͷ D ࣍ݩͷϕΫτϧ x ʹର͢ΔଟมྔΨ΢ε෼෍Λಋೖ ͢Δɻ N(x|µ, Σ)

    = 1 (2π)D/2 1 |Σ|1/2 exp { − 1 2 (x − µ)TΣ−1(x − µ) } (1.12) ͜͜Ͱɺµ Λ D ࣍ݩͷฏۉϕΫτϧͱ͠ɺΣ Λ D × D ͷڞ෼ࢄߦྻͱ ͢Δɻ ͜ͷ৔߹Ͱ΋ن֨Խ৚݅ͱɺฏۉͱڞ෼ࢄ͸ҎԼͷੑ࣭Λຬͨ͢ɻ ∫ N(x|µ, Σ) dx = 1 (1.13) E[x] = ∫ N(x|µ, Σ)x dx = µ (1.14) cov[x] = E[(x − E[x])(x − E[x])T] = Σ (1.15) 9 / 67
  8. 1-3. ࠞ߹Ψ΢ε෼෍ ಋೖͨ͠ଟมྔΨ΢ε෼෍Λ࠷໬ਪఆΛ༻͍ͯɺ؍ଌ͞Εͨϓϩοτʹ ౰ͯ͸ΊͯݟΔ͜ͱΛߟ͑Δɻ Ψ΢ε෼෍ͷ୯ๆੑ (෼෍ͷࢁ͕Ұͭ) ͸෼෍Λσʔλʹ౰ͯ͸ΊΔͱ ͖ʹେ͖ͳ੍ݶΛ༩͑Δɻ ྫ͑͹ɺ2 ࣍ݩσʔλ

    (Old Faithful ؒܽઘσʔλ) ͷ྘৭ͷϓϩοτʹ ରͯ͠ɺD = 2 ͷଟมྔΨ΢ε෼෍ (1.12) Λ౰ͯ͸Ίͨͷ͕ҎԼͷά ϥϑͰ͋Δɻ Ψ΢ε෼෍͕୯ๆͰ͋ΔͨΊʹɺϓϩοτͷ 2 ͭͷմΛ͏·͘දݱͰ͖ ͓ͯΒͣɺϓϩοτ͕ଘࡏ͠ͳ͍఺͕؍ଌ͞ΕΔ֬཰͕ߴ͘ͳͬͯ ͍Δɻ 10 / 67
  9. 1-3. ࠞ߹Ψ΢ε෼෍ ͦ͜ͰɺҎԼͷࠞ߹Ψ΢ε෼෍Λಋೖ͢Δɻ p(x|π, µ, Σ) = K ∑ k=1

    πk N(x|µk , Σk ) (1.16) ͜͜Ͱɺπ, µ, Σ ͸ͦΕͧΕ πk , µk , Σk ͷू߹Λද͢ɻ µ, Σ ʹ͸୯ҰΨ΢ε෼෍ͷͱ͖ͱಉ͡Α͏ʹ߆ଋ৚݅͸ͳ͍͕ɺ(1.16) ͷن֨Խ৚݅ΑΓ ∫ p(x|π, µ, Σ) dx = K ∑ k=1 πk ∫ N(x|µk , Σk ) dx = K ∑ k=1 πk = 1 (1.17) ͱͳΓɺπk ʹ͸੍ݶ͕ͭ͘ɻ 11 / 67
  10. 1-3. ࠞ߹Ψ΢ε෼෍ 2 ࣍ݩσʔλ (Old Faithful ؒܽઘσʔλ) ରͯ͠ɺK = 2

    ͷࠞ߹Ψ΢ε ෼෍Λ౰ͯ͸Ίͨͷ͕ҎԼͷάϥϑͰ͋Δɻ ࠞ߹Ψ΢ε෼෍ͷཁૉͰ͋Δ 2 ͭͷΨ΢ε෼෍͕ϓϩοτͷ 2 ͭͷմ Λ͏·͘දݱ͍ͯ͠Δ͜ͱ͕Θ͔Δɻ 12 / 67
  11. 2. ࠞ߹Ϟσϧͱ EM ࣄલ४උͱͯ͠঺հͨࠞ͠߹Ϟσϧ (ࠞ߹Ψ΢ε෼෍) ͸σʔλͷΫϥ ελϦϯάʹ࢖༻Ͱ͖Δɻ ·ͣ 2-1 (PRML

    9.1) Ͱ͸ɺΫϥελϦϯάͱ͸Կ͔Λઆ໌͠ͳ͕Βɺ K-means ΞϧΰϦζϜΛ༻͍ͨඇ֬཰తͳΫϥελϦϯάͷํ๏ʹͭ ͍ͯઆ໌͢Δɻ ࣍ʹ 2-2 (PRML 9.2) Ͱ͸ɺજࡏม਺Λಋೖ͠ɺࠞ߹Ψ΢ε෼෍ͷ࠷໬ ਪఆΛߦ͏͜ͱͰɺൃݟ๏తʹ EM ΞϧΰϦζϜΛಋೖ͢Δɻ 2-3 (PRML 9.3) Ͱ͸ɺҰൠతͳ EM ΞϧΰϦζϜΛಋೖ͠ɺͦΕΛࠞ ߹Ψ΢ε෼෍ʹద༻͢Δ͜ͱͰ 2-2 (PRML 9.2) ͷ݁ՌΛ࠶ݱ͢Δ͜ͱ ΛΈΔɻ ࠷ޙʹ 2-4 (PRML 9.4) Ͱ͸ɺ2-3 (PRML 9.3) Ͱಋೖͨ͠Ұൠతͳ EM ΞϧΰϦζϜ͕໬౓ؔ਺Λ࠷େԽ͢Δ͜ͱΛࣔ͢ɻ 13 / 67
  12. 2-1. K-means ΫϥελϦϯά ·ͣ͸͡ΊʹɺD ࣍ݩϢʔΫϦου্ۭؒͷ֬཰ม਺ x ʹ͓͍ͯɺN ݸͷ؍ଌσʔλ {x1 ,

    · · · , xN } ͕͋Δͱ͢Δɻ ΫϥελϦϯάͷ໨త͸͜ΕΒͷ؍ଌσʔλ {x1 , · · · , xN } Λ K ݸͷ Ϋϥελʔ (σʔλू߹ͷάϧʔϓ) ʹׂΓৼΔ͜ͱͰ͋Δɻ ɹ K-means ΫϥελϦϯάͰ͸ɺ·ͣϓϩτλΠϓͱݺ͹ΕΔ K ݸͷ D ࣍ݩϕΫτϧ µk (k = 1, · · · , K) Λ༻ҙ͢Δɻ (࣮૷Ͱ͸ɺµk ͷॳظ஋ͱͯ͠ɺϥϯμϜͳϕΫτϧͱͨ͠Γ͢Δɻ) ޙͰΘ͔ΔΑ͏ʹɺK-means ΫϥελϦϯάద༻ޙɺϓϩτλΠϓ µk ͸Ϋϥελʔ k ͷத৺Λද͢ϕΫτϧʹͳΔ͜ͱ͕Θ͔Δɻ 14 / 67
  13. 2-1. K-means ΫϥελϦϯά ࣍ʹɺ֤σʔλ xn ʹରͯ͠ɺೋ஋ม਺ rnk ∈ {0, 1}

    Λಋೖ͢Δɻ ͜ͷม਺͸σʔλ xn ͕ K ݸ͋ΔΫϥελʔͷ಺ɺͲͷΫϥελʔʹ ׂΓ౰ͯΒΕΔ͔Λද͢ม਺Ͱ͋Δɻ ·ͨɺΫϥελʔ k ͷఴ͑ࣈʹରͯ͠ن֨Խ ∑ k rnk = 1 Λຬͨ͢ͱ ͢Δɻ ྫ͑͹ɺ͋Δσʔλ xn ʹରͯ͠ɺ͋Δ k ʹରͯ͠ rnk = 1 Ͱ͋Γɺ j ̸= k ͳΔ j ʹରͯ͠ rnj = 0 Ͱ͋Ε͹ɺσʔλ xn ͸Ϋϥελʔ k ʹ ଐ͢Δ͜ͱΛҙຯ͢Δɻ 15 / 67
  14. 2-1. K-means ΫϥελϦϯά ͜ΕΒͷ 2 छྨͷม਺Λ༻ҙͯ͠ɺҎԼͷ໨తม਺Λ࠷খʹ͢Δͷ͕ɺ K-means ΫϥελϦϯάͰ͋Δɻ J =

    N ∑ n=1 K ∑ k=1 rnk ∥xn − µk ∥2 = N ∑ n=1 jn (2.1) ͜͜Ͱɺ jn = K ∑ k=1 rnk ∥xn − µk ∥2 (2.2) ͱఆٛͨ͠ɻ 16 / 67
  15. 2-1. K-means ΫϥελϦϯά ͜ͷ໨తؔ਺ J Λ࿪Έई౓ͱݺͿɻ͜ͷ J ͷղऍΛߟ͑Δɻ ·ͣɺ͋Δσʔλ xn

    ͕Ϋϥελʔ k ʹଐ͍ͯ͠Δͱ͢Δɻ ͭ·Γɺ͋Δ k ʹରͯ͠ rnk = 1 Ͱ͋Γɺj ̸= k ͳΔ j ʹରͯ͠ rnj = 0 Ͱ͋Δͱ͖ jn ͸ (2.2) ͸ jn = ∥xn − µk ∥2 (2.3) ͱͳΔɻ (2.3) ͸σʔλ xn ͱΫϥελʔ k ͷϓϩτλΠϓ µk ͷڑ཭ͷೋ৐Ͱ ͋Δɻ (2.1) ΑΓɺJ ͸ jn ͷ࿨ͳͷͰɺJ ͸֤σʔλ xn ͕ଐ͢ΔΫϥελʔ ͷϓϩτλΠϓͱͷڑ཭ͷೋ৐ͷ࿨ͱղऍͰ͖Δɻ 17 / 67
  16. 2-1. K-means ΫϥελϦϯά ͦΕͰ͸ɺ(2.1) ͷӈลΛ {rnk } ͱ {µk }

    ʹ͍ͭͯ࠷খԽ͢Δɻ ͜͜Ͱ͸ɺrnk ͱ µk ͷͦΕͧΕʹରͯ͠࠷খԽΛߦ͏ 2 ͭͷεςοϓ Λަޓʹ܁Γฦ͢͜ͱͰ࣮ݱ͢Δɻ ͭ·Γɺ·ͣ µk ͷॳظ஋ͱͯ͠ϥϯμϜͳϕΫτϧΛઃఆ͠ɺҎԼͷ 2 εςοϓΛ܁Γฦ͢ɻ 1. µk Λݻఆͭͭ͠ɺrnk ʹ͍ͭͯ J Λ࠷খԽ͢Δɻ 2. rnk Λݻఆͭͭ͠ɺµk ʹ͍ͭͯ J Λ࠷খԽ͢Δɻ ࣮ࡍ͸ɺ͜ͷ 2 εςοϓΛ࠶ׂΓ౰͕ͯى͜Βͳ͘ͳΔ·Ͱߦ͏ɻ(܁ Γฦ͠ l ճ໨ͷ rnk ͷ஋ͱ܁Γฦ͠ l + 1 ճ໨ͷ rnk ͷ஋͕͢΂ͯͷ n ͱ k Ͱ౳͘͠ͳΔͱऴྃɻ) ·ͨɺ͋Β͔͡Ί܁Γฦ͠ճ਺ΛܾΊ͓ͯ͘৔߹΋͋Δɻ 18 / 67
  17. 2-1. K-means ΫϥελϦϯά ͦΕͰ͸ɺ࣮ࡍʹεςοϓ 1 ͔ΒߦͬͯΈΔɻ ·ͣɺJ Λ rnk ʹ͍ͭͯ࠷খԽ͢Δʹ͸ɺͦΕͧΕ

    n ʹର͢Δ jn Λ࠷ খʹ͢Δ rnk ΛٻΊΕ͹Α͘ɺ jn = K ∑ k=1 rnk ∥xn − µk ∥2 (2.4) Ͱ͋Δ͔Βɺ͢΂ͯͷ k ͷதͰ ∥xn − µk ∥2 ͕࠷΋খ͍͞ k ʹର͢Δ rnk Λ 1 ʹ͢Ε͹Α͍ɻ·ͨɺͦΕҎ֎ͷ j ̸= k ʹର͢Δ rjk ͸θϩʹ ͢Ε͹Α͍ɻ ͭ·ΓࣜͰද͢ͱɺ rnk = { 1 k = argminj ∥xn − µj ∥2 0 otherwise (2.5) ͱͳΔɻ 19 / 67
  18. 2-1. K-means ΫϥελϦϯά ࣍ʹεςοϓ 2 Λߦ͏ɻ εςοϓ 1 ͰܾΊͨ rnk

    Λݻఆͯ͠ɺ(2.1) Λ µk ʹରͯ͠࠷খԽ͢Δ ͨΊʹ J Λ µk Ͱඍ෼͢Δͱ ∂J ∂µk = N ∑ n=1 rnk ∂ ∂µk ∥xn − µk ∥2 = −2 N ∑ n=1 rnk (xn − µk ) = −2 ( N ∑ n=1 rnk xn − µk N ∑ n=1 rnk ) (2.6) ͱͳΔɻ 20 / 67
  19. 2-1. K-means ΫϥελϦϯά (2.6) ΑΓɺJ Λ࠷খԽ͢Δ µk ͸ µk =

    N ∑ n=1 rnk xn N ∑ n=1 rnk (2.7) ͱͳΔɻ (2.7) ͷӈลͷ෼฼͸Ϋϥελʔ k ʹଐ͍ͯ͠Δσʔλͷ૯਺Ͱ͋Γɺ (2.7) ͷӈลͷ෼ࢠ͸Ϋϥελʔ k ʹଐ͍ͯ͠Δσʔλ (ϕΫτϧ) ͷ࿨ Ͱ͋Δɻ ͭ·Γɺµk ͸Ϋϥελʔ k ͷத৺Λද͢ϕΫτϧͰ͋Δɻ 21 / 67
  20. 2-1. K-means ΫϥελϦϯά ·ͣɺάϥϑ͸ 2 ࣍ݩͷೖྗۭؒͰɺ؍ଌ஋Ͱ͋Δϓϩοτ͕ඳ͔Εͯ ͍Δɻ άϥϑ (a) ͷ

    2 ͭͷʷ͕ϓϩτλΠϓ µ1 ͱ µ2 ͷॳظ஋Ͱ͋Δɻ (͜ͷྫͰ͸ɺϥϯμϜͰ͸ͳ͘ɺΘ͟ͱΫϥελʔͷத৺ʹͳΓ͏Δ ఺͔Βେ͖ͣ͘Βͨ͠৔ॴΛॳظ஋ͱ͍ͯ͠Δɻ) άϥϑ (b) ͸εςοϓ 1 ͷ࣮ࢪ݁ՌͰɺµk Λݻఆͯ͠ɺrnk ΛܾΊ͓ͯ Γɺσʔλ͸ µ1 ͱ µ2 ͷ͍ۙํͷΫϥελʔʹଐ͢Δɻ άϥϑ (c) ͸εςοϓ 2 ͷ࣮ࢪ݁ՌͰɺrnk Λݻఆͯ͠ɺµk Λ (2.7) Λ ༻͍ܾͯΊ͓ͯΓɺµk ͸ͦΕͧΕͷΫϥελʔʹଐ͢ΔϕΫτϧͷฏ ۉ஋ΛͱΔɻ άϥϑ (d)ʙ(f) ͸ 2 ͭͷεςοϓͷ 2 ճ໨ͷ܁Γฦ͠Ͱɺάϥϑ (g)ʙ (i) ͸ 2 ͭͷεςοϓͷ 3 ճ໨ͷ܁Γฦ͠Ͱ͋Δɻ άϥϑ (i) ΛΈΔͱɺάϥϑ (b) ʹൺ΂ͯɺద੾ͳΫϥελʔͷׂΓ౰ ͕ͯͰ͖͍ͯΔ͜ͱ͕ݟͯͱΕΔɻ 23 / 67
  21. 2-1. K-means ΫϥελϦϯά Ҏ্͕ K-means ΫϥελϦϯάͷղઆͰ͋Δɻ K-means ΫϥελϦϯάͷಛ௃͸ɺσʔλΛ͋Δ 1 ͭͷΫϥελʔʹඞ

    ׂͣΓ౰ͯΔͱ͜ΖͰ͋Δɻ(ϋʔυׂΓ౰ͯͱ͍͏ɻ) ࠷ऴతͳ µk ʹ͍ۙσʔλ఺Ͱ͋Ε͹ɺͦͷΑ͏ͳׂΓ౰ͯ͸ਖ਼ͦ͠͏ Ͱ͋Δ͕ɺΫϥελʔͷڥքઢ্ʹ͋ΔΑ͏ͳσʔλ఺ʹରͯ͠ɺ1 ͭ ͷΫϥελʔʹඞׂͣΓ౰ͯΔ͜ͱ͕ద੾Ͱ͋Δ͔Ͳ͏͔͸Θ͔Β ͳ͍ɻ ͦ͜ͰɺׂΓ౰ͯʹෆ࣮֬ੑΛ࣋ͨͤΔΫϥελϦϯάख๏Λߟ͑Δɻ (ιϑτׂΓ౰ͯͱ͍͏ɻ) ͦͷͨΊʹ͸֬཰తͳΫϥελϦϯάख๏Λߦ͏ඞཁ͕͋Δɻ 24 / 67
  22. 2-2. ࠞ߹Ψ΢ε෼෍ ͜͜Ͱ͸ɺࠞ߹Ψ΢ε෼෍ͷผͷදݱͱͯ͠ɺજࡏม਺ΛؚΉࠞ߹Ψ΢ ε෼෍ͷදݱΛಋग़͢Δɻ ͜ͷજࡏม਺ΛؚΉදݱͰ࠷໬ਪఆ๏Λߦ͏ࡍɺڧྗͳ EM ΞϧΰϦζ Ϝ͕࢖༻Ͱ͖Δ͜ͱΛΈΔɻ ɹ ·ͣɺK

    ࣍ݩͷ 2 ஋֬཰ม਺ z Λಋೖ͢Δɻ ͜ͷϕΫτϧͷ੒෼Λ zk (k = 1, · · · , K) ͱ͠ɺ͜ͷ੒෼͸ zk ∈ {0, 1} ͱ ∑ k zk = 1 Λຬͨ͢ͱ͢Δɻ ͭ·Γ͜ͷ৚݅͸ɺ͋Δ k ʹ͓͍ͯ zk = 1 Ͱ͋Γɺͦͷଞͷ੒෼͸ zj(̸=k) = 0 ͱͳΔ͜ͱΛද͍ͯ͠Δɻ 25 / 67
  23. 2-2. ࠞ߹Ψ΢ε෼෍ ࣍ʹɺಋೖͨ֬͠཰ม਺ z ͷपล֬཰ p(z) ΛҎԼͷΑ͏ʹఆٛ͢Δɻ p(zk = 1)

    = πk (k = 1, · · · , K) (2.8) ͜͜Ͱɺp(zk = 1) ͸֬཰ม਺ z ͷ k ੒෼͕ zk = 1 Ͱ͋Γɺͦͷଞͷ੒ ෼͸ zj(̸=k) = 0 Ͱ͋ΔΑ͏ͳ஋Ͱͷ֬཰ p(z) Λද͢ɻ ͜ΕΑΓɺن֨Խ৚͔݅Βɺπk ͸ ∑ z p(z) = K ∑ k=1 p(zk = 1) = K ∑ k=1 πk = 1 (2.9) ͱͳΓɺ·ͨ֬཰ͷඇෛੑ (πk ≥ 0) ͱ (2.9) ΑΓ 0 ≤ πk ≤ 1 (2.10) Λຬͨ͢ɻ 26 / 67
  24. 2-2. ࠞ߹Ψ΢ε෼෍ ෼෍ (2.8) ͸ҎԼͷΑ͏ʹ·ͱΊͯॻ͚Δɻ p(z) = K ∏ k=1

    πzk k (2.11) z ͷ k ੒෼͕ zk = 1 Ͱ͋Γɺͦͷଞͷ੒෼͸ zj(̸=k) = 0 Ͱ͋ΔΑ͏ͳ ͱ͖Λߟ͑Δͱɺ p(zk = 1) = K ∏ k=1 πzk k = π0 1 ×· · ·×π0 k−1 ×π1 k ×π0 k+1 ×· · ·×π0 K = πk (2.12) ͱͳΓɺ͔֬ʹ (2.12) ͸ (2.8) ͱҰக͢Δɻ 27 / 67
  25. 2-2. ࠞ߹Ψ΢ε෼෍ ࣍ʹ D ࣍ݩͷ֬཰ม਺ x ʹ͓͍ͯɺz ͕༩͑ΒΕͨͱ͖ͷ x ͷ৚݅෇

    ͖֬཰ΛҎԼͷΑ͏ʹΨ΢ε෼෍ͰԾఆ͢Δɻ p(x|zk = 1) = N(x|µk , Σk ) (k = 1, · · · , K) (2.13) ͜Ε΋ (2.11) ͱಉ͡Α͏ʹҎԼͷΑ͏ʹ·ͱΊͯॻ͚Δɻ p(x|z) = K ∏ k=1 N(x|µk , Σk )zk (2.14) (2.11) ͱ (2.14) ΑΓɺपล෼෍ p(z) ͱ৚݅෇͖֬཰෼෍ p(x|z) Λఆٛ ͨ͠ͷͰɺ֬཰ͷՃ๏ఆཧͱ৐๏ఆཧʹΑΓɺपล෼෍ p(x) ͕ҎԼͷ Α͏ʹܭࢉͰ͖Δɻ p(x) = ∑ z p(x, z) = ∑ z p(x|z)p(z) = ∑ z K ∏ k=1 ( πk N(x|µk , Σk ) )zk = K ∑ k=1 πk N(x|µk , Σk ) (2.15) 28 / 67
  26. 2-2. ࠞ߹Ψ΢ε෼෍ (2.15) ΑΓɺपล෼෍ p(x) ͸ࠞ߹Ψ΢ε෼෍Ͱ͋Δ͜ͱ͕Θ͔Δɻ ͜ΕΑΓɺજࡏม਺ z ΛؚΉࠞ߹Ψ΢ε෼෍ͷผͷදݱΛಋग़Ͱ͖ͨɻ ͜͜ͰɺॏཁͳྔͰ͋Δෛ୲཰Λಋೖ͢Δɻ

    ෛ୲཰ͱ͸ɺx ͕༩͑ΒΕͨͱ͖ͷ z ͷ৚݅෇͖֬཰Ͱ͋Γɺ γk (x) = p(zk = 1|x) ͱॻ͘ɻ γk (x) ͸ (2.8) ͱ (2.13) ͱϕΠζͷఆཧ (1.3) ΑΓɺ γk (x) = p(zk = 1|x) = p(zk = 1)p(x|zk = 1) K ∑ j=1 p(zj = 1)p(x|zj = 1) = πk N(x|µk , Σk ) K ∑ j=1 πj N(x|µj , Σj ) (2.16) ͱͳΔɻ 29 / 67
  27. 2-2. ࠞ߹Ψ΢ε෼෍ ͜ͷෛ୲཰ͷղऍΛྫΛ༻͍ͯઆ໌͢Δɻ(ҎԼͷਤ͸ 3 ͭͷΨ΢ε෼ ෍ͷࠞ߹ (K = 3) ͔Βੜ੒ͨ͠

    500 ఺Ͱ͋Δɻ) ·ͣɺಉ࣌෼෍ p(x, z) = p(x|z)p(z) ͔Βαϯϓϧ xn ͱ zn Λੜ੒ͨ͠ ͱ͢Δɻ άϥϑ (a) ͸αϯϓϧ xn Λϓϩοτ͠ɺରԠ͢Δ zn ͷ 3 ͭͷঢ়ଶɺ੺ (k = 1)ɺ྘ (k = 2)ɺ੨ (k = 3) Ͱ৭͕ృΒΕ͍ͯΔɻ άϥϑ (b) ͸ੜ੒͞Εͨαϯϓϧ xn ͱ zn ͷ಺ɺzn Λࣺͯͯɺxn Λϓ ϩοτͨ͠ਤͰ͋Δɻ 30 / 67
  28. 2-2. ࠞ߹Ψ΢ε෼෍ ·ͨɺάϥϑ (c) ͸αϯϓϧ xn ͕ಘΒΕͨΒɺ(2.16) ͔ΒରԠ͢Δෛ ୲཰ γk

    (xn ) (k = 1, 2, 3) Λܭࢉ͠ɺରԠ͢Δσʔλ఺ xn Λ γk (xn ) ʹ ൺྫ͢Δྔͷ੺ (k = 1)ɺ྘ (k = 2)ɺ੨ (k = 3) Ͱ৭Λ࢖ͬͯృͬͯ ͍Δɻ ྫ͑͹ɺγ1 (xn ) = 1.0 ͷ఺͸੺ͰృΒΕɺγ2 (xn ) = γ3 (xn ) = 0.5 ͷ఺ ͸྘ͱ੨Λಉྔࠞͥͨ৭ͰృΔɻ άϥϑ (a) ͱ (c) ΑΓɺෛ୲཰͸఺͕ 3 ͭͷΨ΢ε෼෍ͷ಺ͲΕ͔Βੜ ੒͞ΕΔͷ֬཰Λ༩͑Δɻ 31 / 67
  29. 2-2-1. ࠷໬ਪఆ ͦΕͰ͸ɺ؍ଌͨ͠σʔλͷू߹ {x1 , · · · , xN

    } Λࠞ߹Ψ΢ε෼෍ʹ౰ ͯ͸ΊΔͨΊʹɺ࠷໬ਪఆ๏Λ࣮ࢪ͠Α͏ɻ ·ͣɺσʔλͷू߹Λͻͱ·ͱΊʹ N × D ߦྻ X = (x1 , · · · , xN )T Λ ఆٛ͢Δͱɺσʔλ͕ (2.15) ͔Βಠཱʹੜ੒͞Εͨͱ͢Δͱɺ p(X|π, µ, Σ) = N ∏ n=1 [ K ∑ k=1 πk N(xn |µk , Σk ) ] (2.17) ͱͳΔɻ ͜ΕΑΓɺର਺໬౓͸ ln p(X|π, µ, Σ) = N ∑ n=1 ln { K ∑ k=1 πk N(xn |µk , Σk ) } (2.18) ͱͳΔɻ 32 / 67
  30. 2-2-1. ࠷໬ਪఆ ͜͜Ͱɺࠞ߹Ψ΢ε෼෍ͷ࠷໬ਪఆͰ͸ɺ୯ҰͷΨ΢ε෼෍ͷ࠷໬ਪఆ Ͱ͸ى͜Βͳ͍ಛ༗ͷ໰୊఺͕͋Δ͜ͱʹ஫ҙ͢΂͖Ͱ͋Δɻ ·ͣɺࠞ߹Ψ΢ε෼෍ͷ֤ཁૉͷڞ෼ࢄߦྻ͕ Σk = σ2 k I

    ͷͱ͖Λߟ ͑Δɻ ͜ͷͱ͖ɺର਺໬౓͸ ln p(X|π, µ, Σ) = N ∑ n=1 ln { K ∑ k=1 πk N(xn |µk , σ2 k I) } (2.19) ͱͳΔɻ ·ͨɺࠞ߹Ψ΢ε෼෍ͷ j ൪໨ͷཁૉͷฏۉύϥϝʔλ µj ͕ n ݸ໨ͷ σʔλͱ౳͍͠ͱԾఆ͢Δɻ(µj = xn ) 33 / 67
  31. 2-2-1. ࠷໬ਪఆ ͜ͷͱ͖ɺn ݸ໨σʔλ఺͸ର਺໬౓ͷࠞ߹Ψ΢ε෼෍ͷ j ൪໨ͷཁ ૉʹ N(xn |µj ,

    σ2 j I) = N(xn |xn , σ2 j I) = 1 (2π)D/2 1 σD j (2.20) ͱ͍͏ܗͰد༩͢Δɻ ͜Ε͸ σj → 0 Ͱൃࢄ͠ɺର਺໬౓ (2.20) ΋ൃࢄ͢Δɻ ͭ·Γ࠷໬ਪఆͰ͸ର਺໬౓Λ࠷େʹ͢ΔΑ͏ʹύϥϝʔλΛܾΊΔ ͷͰɺ͜ͷࣄ࣮͸ͲΜͳσʔλ఺ͷू߹ {x1 , · · · , xN } ͕༩͑ΒΕͯ΋ɺ ࠞ߹Ψ΢ε෼෍ͷ j ൪໨ͷཁૉͷฏۉύϥϝʔλ µj Λ n ݸ໨ͷσʔλ xn ͱ౳͘͠͠ɺj ൪໨ͷཁૉͷඪ४ภࠩΛ σj = 0 ͱ͢Ε͹ɺ͋ͱͷύ ϥϝʔλ͸ద౰Ͱ΋ର਺໬౓Λ࠷େʹ͢Δ͜ͱ͕ग़དྷΔɻ ͭ·ΓͲͷΑ͏ͳσʔλ఺ͷू߹͕༩͑ΒΕͯ΋ɺ࠷໬ਪఆͷ݁Ռɺxn ͸ࠞ߹Ψ΢ε෼෍ͷ j ൪໨ͷཁૉ͔Βੜ੒͞Ε (N(xn |µj , σ2 j I) → ∞ ΑΓ)ɺͦΕҎ֎ͷσʔλ఺͸ࠞ߹Ψ΢ε෼෍ͷͲͷཁૉ͔Βੜ੒͞Ε ͯ΋ྑ͍ (ର਺໬౓Λ࠷େʹ͢Δ) ͱ͍͏ಛҟతͳঢ়گʹ͓͍ͪΔɻ 34 / 67
  32. 2-2-1. ࠷໬ਪఆ ͜Ε͸୯ҰͷΨ΢ε෼෍ͷ࠷໬ਪఆͰ͸ى͜Βͳ͔ͬͨɻ ୯ҰͷΨ΢ε෼෍ͷ໬౓ؔ਺͸ p(X|µ, Σ) = N ∏ n=1

    N(xn |µ, σ2I) (2.21) Ͱ͋Γɺฏۉύϥϝʔλ µ ͕ n ݸ໨ͷσʔλͱ౳͍͠ͱ͢Δ (µ = xn ) ͱɺ p(X|µ, Σ) = N ∏ n=1 N(xn |xn , σ2I) (2.22) ͱͳΔɻ ͜ͷঢ়گͰ σ → 0 ͱ͢Δͱɺ͔֬ʹ (2.22) ͷ n ൪໨ͷҼࢠ͸ൃࢄ͢Δ ((2.20) ΑΓႈ৐ͷൃࢄ) ͕ɺn ൪໨Ҏ֎ͷҼࢠࢦ਺తʹݮਰ͠ 0 ʹऩଋ ͢Δɻ Αͬͯɺσ → 0 ͱͯ͠΋໬౓ؔ਺ (2.21) ൃࢄͤͣɺ࠷খͷ 0 ʹͳΔͷ Ͱɺઌ΄Ͳઆ໌ͨ͠ಛҟతͳঢ়گ͸ࠞ߹Ψ΢ε෼෍ಛ༗ͷ໰୊Ͱ͋Δɻ 35 / 67
  33. 2-2-1. ࠷໬ਪఆ ·ͨɺࠞ߹Ψ΢ε෼෍ͷ࠷໬ਪఆղ͸ղੳతʹٻΊΔ͜ͱ͕Ͱ͖ͳ͍ͷ Ͱɺ൓෮๏Λ༻͍ͯਪఆ͢Δ͜ͱΛߟ͑Δɻ ͦͷํ๏ͷҰͭͱͯ͠ɺχϡʔϥϧωοτͰ༻͍ΒΕΔΑ͏ͳޯ഑๏͕ ͋Δɻ ͨͩ͠ɺࠓճ͸ EM ΞϧΰϦζϜͱݺ͹ΕΔ൓෮๏Λ༻͍ͯਪఆ͢Δํ ๏Λߟ͑Δɻ

    ·ͨɺઌ΄Ͳઆ໌ͨࠞ͠߹Ψ΢ε෼෍ಛ༗ͳපతͳ໰୊͸ EM ΞϧΰϦ ζϜΛ༻͍ͯ΋ղܾͰ͖Δ໰୊Ͱ͸ͳ͍͜ͱʹ஫ҙɻ ͳͷͰ͜ͷষͰ͸ɺEM ΞϧΰϦζϜͳͲΛ༻͍ͯ൓෮తʹਪఆ͍ͯ͠ Δͱ͖ʹɺ΋͠ର਺໬౓͕ൃࢄͦ͠͏ͳΒɺͦͷฏۉύϥϝʔλ µj Λ ϥϯμϜʹઃఆ͠ɺΣj ͷ੒෼Λ͋Δఔ౓େ͖͍஋ʹͯ͠൓෮Λ࠶։͢ Δͱ͍͏ൃݟ๏తͳରॲ๏Λߦ͏͘Β͍ʹͱͲΊ͓ͯ͘ɻ 10 ষͰ͸ɺϕΠζతͳΞϓϩʔνΛߦ͏͜ͱʹΑΓɺ͜ͷපతͳ໰୊ ͸ى͜Βͳ͘ͳΔ͜ͱΛΈΔɻ 36 / 67
  34. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ͜͜Ͱ͸ɺࠞ߹Ψ΢ε෼෍ͷ࠷໬ਪఆΛ EM ΞϧΰϦζϜΛ༻͍ͯղ͘ ͜ͱΛߟ͑Δɻ ·ͣ͸ɺର਺໬౓ (2.18)

    Λฏۉ µk ͱ෼ࢄ Σk ͱࠞ߹܎਺ πk Ͱඍ෼͠ ͯ 0 ͱͳΔࣜΛॻ͖Լͦ͏ɻ ฏۉ µk Ͱඍ෼͢Δͱ ∂ ∂µk ln p(X|π, µ, Σ) = N ∑ n=1 ∂ ∂µk ln { K ∑ j=1 πj N(xn |µj , Σj ) } = N ∑ n=1 πk ∑ j πj N(xn |µj , Σj ) ∂ ∂µk N(xn |µk , Σk ) = N ∑ n=1 πk N(xn |µk , Σk ) ∑ j πj N(xn |µj , Σj ) Σ−1 k (xn − µk ) (2.23) ͱͳΔɻ(࠷ޙͷΠίʔϧ͸Ψ΢ε෼෍ͷදࣜ (1.12) Λ༻͍ͨ) 37 / 67
  35. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ͜͜Ͱɺෛ୲཰ (2.16) Λ༻͍Δͱɺର਺໬౓ͷ µk ඍ෼͸ ∂

    ∂µk ln p(X|π, µ, Σ) = N ∑ n=1 γk (xn )Σ−1 k (xn − µk ) =Σ−1 k [ N ∑ n=1 γk (xn )xn − µk N ∑ n=1 γk (xn ) ] (2.24) ͱͳΓɺର਺໬౓ͷ µk ඍ෼Λθϩʹͨࣜ͠͸ µk = 1 Nk N ∑ n=1 γk (xn )xn (2.25) ͱͳΔɻ͜͜Ͱɺ Nk = N ∑ n=1 γk (xn ) (2.26) ͱఆٛͨ͠ɻ 38 / 67
  36. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ͜͜Ͱɺલʹड़΂ͨෛ୲཰ͷղऍʹΑΓɺ(2.26) ͷ Nk ͸ k ൪໨ͷΫϥ

    ελʔʹׂΓ౰ͯΒΕΔ༗ޮσʔλ਺Λҙຯ͓ͯ͠Γɺ͞Βʹ (2.25) ΑΓ µk ͸σʔλ఺ͷॏΈ෇͖ฏۉͰ͋Γɺෛ୲཰ γk (xn ) ͕େ͖͍ σʔλ xn ΄Ͳ µk ʹେ͖͘د༩͢Δܗʹͳ͍ͬͯΔɻ ࣍ʹର਺໬౓ͷڞ෼ࢄ Σk ඍ෼Λθϩͱ͓͘ͱ Σk = 1 Nk N ∑ n=1 γk (xn )(xn − µk )(xn − µk )T (2.27) ͱͳΓɺฏۉ (2.25) ͱಉ͡ղऍ͕Ͱ͖Δɻ ࠷ޙʹର਺໬౓Λࠞ߹܎਺ πk Ͱඍ෼͢Δ͕ɺπk ͸ن֨Խ৚݅ (2.9) Λ ຬͨ͢΂͖ͳͷͰɺϥάϥϯδϡະఆ৐਺๏Λ༻͍ͯɺ ln p(X|π, µ, Σ) + λ ( K ∑ k=1 πk − 1 ) (2.28) Λ πk ͱະఆ৐਺ λ Ͱඍ෼ͯ͠ 0 ͱ͢Ε͹ྑ͍͜ͱ͕Θ͔Δɻ 39 / 67
  37. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ πk Ͱඍ෼ͨ͠ྔΛ 0 ͱ͓͘ͱ N ∑

    n=1 N(xn |µk , Σk ) ∑ j πj N(xn |µj , Σj ) + λ = 0 → 1 πk N ∑ n=1 γk (xn ) + λ = 0 →πk = − 1 λ N ∑ n=1 γk (xn ) (2.29) ͱͳΔɻ ·ͨɺλ Ͱඍ෼ͯ͠ 0 ͱͨࣜ͠͸ (2.28) ΑΓɺ੍໿৚݅ K ∑ k=1 πk = 1 (2.30) ͱͳΔɻ 40 / 67
  38. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ৚݅ (2.30) ʹ (2.29) Λ୅ೖ͢Δͱ −

    1 λ N ∑ n=1 K ∑ k=1 γk (xn ) = 1 (2.31) ͱͳΓɺ(2.16) ͷෛ୲཰ͷఆٛΑΓɺ K ∑ k=1 γk (xn ) = 1 (2.32) Λຬ͔ͨ͢Βɺ(2.31) ͸ − 1 λ N ∑ n=1 1 = 1 →λ = −N (2.33) ͱͳΔɻ 41 / 67
  39. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ (2.33) Λ༻͍Δͱɺ(2.29) ͸ πk = 1

    N N ∑ n=1 γk (xn ) = Nk N (2.34) ͱͳΓɺπk ͸શσʔλ਺ͷΫϥελʔ k ʹଐ͢Δ༗ޮσʔλ਺ͷׂ߹ ʹͳΔ͜ͱ͕Θ͔ͬͨɻ ͜͜ͰٻΊͨ (2.25) ͱ (2.27) ͱ (2.34) ͸ཅͳղΛ༩͑ͳ͍͜ͱΛ஫ҙ ͢΂͖Ͱ͋Δɻ ͭ·ΓɺͦΕͧΕͷࣜͷӈล͸ෛ୲཰ʹґଘ͍ͯͯ͠ɺ(2.16) ΑΓෛ୲ ཰͸ µk ͱ Σk ͱ πk ͷؔ਺Ͱ͋Δ͔ΒͰ͋Δɻ 42 / 67
  40. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ΑͬͯɺҎԼͷ൓෮తͳख๏ʹΑΓɺର਺໬౓ͷ஋Λେ͖͍ͯ͘͘͠ɻ 1. µk ͱ Σk ͱ

    πk ΛॳظԽ͠ɺର਺໬౓ (2.18) ͷॳظ஋Λܭࢉ͢Δ 2. (E εςοϓ) ݱࡏͷύϥϝʔλΛ༻͍ͯҎԼͷෛ୲཰Λܭࢉ͢Δ γk(xn) = πkN (xn|µk, Σk) K ∑ j=1 πj N (xn|µj , Σj ) (2.35) 3. (M εςοϓ) ݱࡏͷෛ୲཰Λ༻͍ͯɺύϥϝʔλΛ࠶ܭࢉ͢Δ µnew k = 1 Nk N ∑ n=1 γk(xn)xn (2.36) Σnew k = 1 Nk N ∑ n=1 γk(xn)(xn − µnew k )(xn − µnew k )T (2.37) πnew k = Nk N (2.38) ͨͩ͠ɺ Nk = N ∑ n=1 γk(xn) (2.39) 4. ର਺໬౓ ln p(X|πnew, µnew, Σnew) = N ∑ n=1 ln { K ∑ k=1 πnew k N (xn|µnew k , Σnew k ) } (2.40) Λܭࢉ͠ɺมԽྔ͕ᮢ஋ΑΓ΋খ͔ͬͨ͞Βऴྃ͠ɺେ͖͔ͬͨΒεςοϓ 2 ʹ໭Δ 43 / 67
  41. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ࠷ޙʹɺҎલ k-means ΞϧΰϦζϜͷྫͰ༻͍ͨσʔλ (Old Faithful ؒܽઘσʔλ)

    ͱಉ͡σʔλʹରͯ͠ɺࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦ ζϜΛదԠͨ͠ਤΛ঺հ͢Δɻ 44 / 67
  42. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ·ͣɺάϥϑ (a) ͷ྘ͷ఺͸σʔλ఺Λද͠ɺ੨ؙ (k = 1)

    ͱ੺ؙ (k = 2) ͕ࠞ߹Ψ΢ε෼෍ͷͦΕͧΕͷཁૉͱͳΔΨ΢ε෼෍Ͱ͋Γɺ ͦͷΨ΢ε෼෍ͷத৺͕ͦΕͧΕͷฏۉϕΫτϧ µk Ͱ͋ΓɺҎલ k-means ΞϧΰϦζϜͰߦͬͨ࣌ͷϓϩτλΠϓͱಉ͡஋ʹॳظԽ ͨ͠ɻ ·ͨɺͦΕͧΕͷΨ΢ε෼෍ͷڞ෼ࢄ͸౳ํత Σk = σ2 k I ʹॳظԽ ͨ͠ɻ άϥϑ (b) ͸ॳΊͯͷ E εςοϓͷ݁ՌͰ͋Γɺෛ୲཰ͷܭࢉ݁ՌΛද ͍ͯ͠Δɻ ͭ·Γɺ֤σʔλ఺ xn Λ γk (xn ) ʹൺྫ͢Δྔͷ੨ (k = 1) ͱ੺ (k = 2) Ͱ৭Λ࢖ͬͯృ͍ͬͯΔɻ ྫ͑͹ɺγ1 (xn ) = 1.0 ͷ఺͸੺ͰృΒΕɺγ1 (xn ) = γ2 (xn ) = 0.5 ͷ఺ ͸੨ͱ੺Λಉྔࠞͥͨࢵ৭ͰృΔɻ άϥϑ (c) ͸ॳΊͯͷ M εςοϓͷ݁ՌͰ͋Γɺྫ͑͹੨ͷΨ΢ε෼ ෍ͷฏۉ͸֤σʔλ͕੨ͷΫϥελʔʹଐ͢Δ֬཰ (γ1 (xn )) ͰॏΈ෇ ͚ΒΕͨॏ͖෇͖ฏۉʹͳ͍ͬͯΔɻ 45 / 67
  43. 2-2-2. ࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜ ͞Βʹ (d),(e),(c) ͸ͦΕͧΕ EM εςοϓΛ 2

    ճɺ5 ճɺ20 ճߦͬͨޙ ͷෛ୲཰ͱ֤ύϥϝʔλͰ͋Γɺ20 ճߦ͏ͱΑ͘ΫϥελϦϯά͞Ε ͍ͯΔ͜ͱ͕ݟͯͱΕΔɻ ͜͜Ͱ͸ɺࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜΛಋೖͨ͠ɻ ࣍ʹҰൠతͳ EM ΞϧΰϦζϜΛಋೖ͢Δɻ 46 / 67
  44. 2-3. EM ΞϧΰϦζϜͷ΋͏Ұͭͷղऍ ͜Ε·Ͱɺࠞ߹Ψ΢ε෼෍ݶఆͷ EM ΞϧΰϦζϜʹ͍ͭͯઆ໌ͨ͠ ͕ɺ͜͜Ͱ͸Ұൠతͳ EM ΞϧΰϦζϜʹ͍ͭͯಋೖ͢Δɻ ·ͣɺσʔλͷू߹Λͻͱ·ͱΊʹ

    N × D ߦྻ X = (x1 , · · · , xN )T ͱ ͠ɺજࡏม਺΋ͻͱ·ͱΊʹ N × K ߦྻ Z = (z1 , · · · , zN )T ͱ͢Δɻ ͢Δͱɺ؍ଌσʔλͷ X ͷର਺໬౓͸ҎԼͷΑ͏ʹͳΔɻ ln p(X|θ) = ln { ∑ Z p(X, Z|θ) } (2.41) ͜͜Ͱɺθ ͸ϞσϧͷύϥϝʔλͰ͋Γɺࠞ߹Ψ΢ε෼෍ͷ࣌͸ π, µ, Σ ʹରԠ͢Δɻ ࠷໬ਪఆΛߦ͏৔߹ɺln p(X|θ) Λ࠷େʹ͢Δ θ Λ΋ͱΊΔ͕ɺ(2.41) ͷӈลͷ p(X, Z|θ) ͕ࢦ਺ܕ෼෍଒Ͱ͋ͬͯ΋ɺ࿨ ∑ Z Λͱͬͯ ln Λ ͱΔ͜ͱͰɺln p(X|θ) ͸ࢦ਺ܕ෼෍଒Ͱͳ͘ҰൠతʹෳࡶͳܗʹͳΔɻ Αͬͯɺ࠷໬ղ΋ෳࡶʹͳΔɻ 47 / 67
  45. 2-3. EM ΞϧΰϦζϜͷ΋͏Ұͭͷղऍ ͜͜Ͱɺ΋͠؍ଌʹΑͬͯ X ͚ͩͰ͸ͳ͘ɺZ ΋ಘΒΕΔͱ͢Δɻ ͜ͷͱ͖ {X, Z}

    Λ׬શσʔλू߹ͱ͍͍ɺX Λෆ׬શσʔλू߹ͱ ͍͏ɻ ΋͠ {X, Z} ͕༩͑ΒΕͨΒɺ׬શσʔλ໬౓ؔ਺ p(X, Z|θ) Λ࠷େʹ ͢Δύϥϝʔλ θ ΛٻΊΔ͜ͱ͕Ͱ͖Δɻ(͜͜ͱ͜ΕҎ߱Ͱ ln p(X, Z|θ) ͸؆୯ʹ࠷େԽͰ͖Δͱ͢Δɻ) ͨͩ͠ɺ؍ଌʹΑΓ Z ͕ಘΒΕͳ͍͜ͱ͕΄ͱΜͲͰ͋Δɻ ΑͬͯɺZ ͕Θ͔Βͳ͍ͷͰɺln p(X, Z|θ) Λ࠷େʹ͢Δ͜ͱ͸Ͱ͖ͣɺ ୅ΘΓʹ ln p(X, Z|θ) ͷ Z ͷࣄޙ෼෍ p(Z|X, θ) ʹ͓͚Δظ଴஋Λར༻ ͢Δ͜ͱΛߟ͑Δɻ ͳͥɺࣄޙ෼෍ p(Z|X, θ) Λར༻͢Δͷ͔ʹ͍ͭͯ͸ɺ2-4 Ͱઆ໌͢Δɻ 48 / 67
  46. 2-3. EM ΞϧΰϦζϜͷ΋͏Ұͭͷղऍ X ͸؍ଌʹΑΓ༩͑ΒΕ͍ͯͯɺp(X, Z|θ) ͷܗ΋Θ͔͍ͬͯΔͱ͖ɺ p(X|θ) Λ࠷େʹ͢Δ θ

    ΛٻΊΔ EM ΞϧΰϦζϜ͸ҎԼͰ͋Δɻ 1. ύϥϝʔλ θold ΛॳظԽ͢Δ 2. (E εςοϓ) ࣄޙ෼෍ p(Z|X, θold) Λܭࢉ͢Δ 3. (M εςοϓ) ࣍ࣜͰ༩͑ΒΕΔ θnew ΛٻΊΔ θnew = argmaxθ Q(θ, θold) (2.42) ͨͩ͠ɺ Q(θ, θold) = ∑ Z p(Z|X, θold) ln p(X, Z|θ) (2.43) 4. ໬౓ؔ਺ p(X|θ) ͷ஋͕ऩଋ৚݅Λຬ͍ͨͯ͠Δ͔Ͳ͏͔Λௐ΂ɺຬͨ͞Ε͍ͯͳ ͚Ε͹ɺ θold ← θnew (2.44) ͱ͠ɺεςοϓ 2 ʹ໭Δ 49 / 67
  47. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ ͦΕͰ͸ɺࠓઆ໌ͨ͠Ұൠతͳ EM ΞϧΰϦζϜΛࠞ߹Ψ΢ε෼෍ͷ৔ ߹ʹద༻ͯ͠ɺͦͷ݁Ռ͕ 2-2-2 ͱ౳͍͜͠ͱΛΈΔɻ ·ͣ׬શσʔλ໬౓ؔ਺͸ (2.11)

    ͱ (2.14) ΑΓ p(X, Z|π, µ, Σ) = N ∏ n=1 K ∏ k=1 πznk k N(xn |µk , Σk )znk (2.45) ͱͳΔɻ ͜͜Ͱɺznk ͸ zn ͷ k ൪໨ͷཁૉͰ͋Δɻ ֬཰ͷ৐๏ఆཧͱՃ๏ఆཧΑΓɺࣄޙ෼෍ p(Z|X, π, µ, Σ) ͸ p(Z|X, π, µ, Σ) = p(X, Z|π, µ, Σ) p(X|π, µ, Σ) = p(X, Z|π, µ, Σ) ∑ Z′ p(X, Z′|π, µ, Σ) (2.46) ͱͳΔɻ 50 / 67
  48. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ ͦΕͰ͸ɺEM ΞϧΰϦζϜͷεςοϓ 1 ͱͯ͠ɺύϥϝʔλ π, µ, Σ Λ

    πold, µold, Σold ॳظԽ͢Δɻ ͜͜Ͱɺεςοϓ 2 ʹೖΔલʹɺθ = {π, µ, Σ} ͱ͢ΔͱɺQ(θ, θold) ͸ Q(θ, θold) = ∑ Z p(Z|X, πold, µold, Σold) ln p(X, Z|π, µ, Σ) = 1 ∑ Z′ p(X, Z′|πold, µold, Σold) × ∑ Z p(X, Z|πold, µold, Σold) ln p(X, Z|π, µ, Σ) (2.47) ͱͳΔɻ 51 / 67
  49. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ (2.45) ΑΓɺ(2.47) ͸ Q(θ, θold) = N ∑

    n=1 K ∑ k=1 [∑ Z znk p(X, Z|πold, µold, Σold) ∑ Z′ p(X, Z′|πold, µold, Σold) ] ln { πk N(xn |µk , Σk ) } (2.48) ͱͳΔɻ ͜͜Ͱɺ·ͨ (2.45) ΑΓɺ ∑ Z znk p(X, Z|πold, µold, Σold) ∑ Z′ p(X, Z′|πold, µold, Σold) = ∑ Z znk N ∏ m=1 K ∏ k′=1 [πold k′ N(xm |µold k′ , Σold k′ )]zmk′ ∑ Z′ N ∏ l=1 K ∏ k′′=1 [πold k′′ N(xl |µold k′′ , Σold k′′ )]z′ lk′′ (2.49) ͱͳΔɻ 52 / 67
  50. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ ͜͜Ͱɺ ∑ Z = ∑ z1 · ·

    · ∑ zN Λ༻͍Δͱɺ(2.49) ͸ ∑ Z znk p(X, Z|πold, µold, Σold) ∑ Z′ p(X, Z′|πold, µold, Σold) = ∑ zn znk K ∏ k′=1 [πold k′ N(xn |µold k′ , Σold k′ )]znk′ ∑ z′ n K ∏ k′′=1 [πold k′′ N(xn |µold k′′ , Σold k′′ )]z′ nk′′ (2.50) ͱͳΔɻ(n ൪໨ͷҼࢠҎ֎͸෼฼෼ࢠͰΩϟϯηϧ͞ΕΔ) (2.50) ͷ෼ࢠ͸ ∑ zn ͷதͰ k ੒෼໨͕ znk = 1 ͱͳΔ߲ͷΈ࢒ΔͷͰɺ ∑ zn znk K ∏ k′=1 [πold k′ N(xn |µold k′ , Σold k′ )]znk′ = πold k N(xn |µold k , Σold k ) (2.51) ͱͳΔɻ 53 / 67
  51. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ Ұํɺ(2.50) ͷ෼฼͸ɺ ∑ zn ͷ߹ܭ K ݸͷ࿨͕શͯ࢒ΔͷͰɺ ∑

    z′ n K ∏ k′′=1 [πold k′′ N(xn |µold k′′ , Σold k′′ )]z′ nk′′ = K ∑ j=1 πold j N(xn |µold j , Σold j ) (2.52) ͱͳΔɻ (2.51) ͱ (2.52) ΑΓ (2.49) ͸ ∑ Z znk p(X, Z|πold, µold, Σold) ∑ Z′ p(X, Z′|πold, µold, Σold) = πold k N(xn |µold k , Σold k ) K ∑ j=1 πold j N(xn |µold j , Σold j ) =γk (xn , πold, µold, Σold) (2.53) ͱͳΓɺෛ୲཰͕ग़ͯ͘Δɻ((2.16) Λ࢖༻ͨ͠) ͜͜ʹग़ͯ͘Δෛ୲཰͸ݱࡏͷύϥϝʔλ θold ͷෛ୲཰Ͱ͋Δ͜ͱʹ ஫ҙɻ 54 / 67
  52. 2-3-1. ࠞ߹Ψ΢ε෼෍࠶๚ ͭ·Γɺεςοϓ 2(E εςοϓ) Ͱࣄޙ෼෍Λܭࢉ͢Δͱݴ͏͜ͱ͸ɺ ࠞ߹Ψ΢ε෼෍Ͱ͸ෛ୲཰ γk (xn ,

    πold, µold, Σold) Λܭࢉ͢Δͱ͍͏͜ ͱʹͳΔɻ ෛ୲཰Λܭࢉͨ͠ΒɺQ(θ, θold) ͸ (2.49) ΑΓ Q(θ, θold) = N ∑ n=1 K ∑ k=1 γk (xn , πold, µold, Σold) ln { πk N(xn |µk , Σk ) } (2.54) ͱͳΓɺεςοϓ 3(M εςοϓ) Ͱ͸ɺQ(θ, θold) Λ࠷େʹ͢Δ θnew = {πnew, µnew, Σnew} Λݟ͚ͭΔɻ ࣮ࡍʹ Q(θ, θold) Λ π, µ, Σ Ͱ࠷େʹ͢ΔࣜΛಋग़͢ΔͱɺҎલಋग़͠ ͨ (2.25) ͱ (2.27) ͱ (2.34) ͱಉ͡ܗͷ͕ࣜग़ͯདྷΔ͜ͱ͕Θ͔Δɻ (PRML ԋश 9.8) 55 / 67
  53. 2-3-2. k-means ͱͷؔ࿈ ͜͜Ͱ͸ɺࠞ߹Ψ΢ε෼෍ͷ EM ΞϧΰϦζϜͷಛघͳ৔߹͕ k-means ΞϧΰϦζϜͰ͋Δ͜ͱΛݟΔɻ ͦͷͨΊʹ·ͣɺҎԼͷΑ͏ʹ (2.13)

    ͷ৚݅෇͖֬཰Ͱɺڞ෼ࢄߦྻ Λ Σk = ϵI ͱ͢Δɻ p(x|zk = 1) = N(x|µk , ϵI) (k = 1, · · · , K) (2.55) ͜ͷͱ͖ɺσʔλ఺ xn ͷෛ୲཰͸ (2.16) ΑΓ γk (xn ) = πk N(xn |µk , ϵI) K ∑ j=1 πj N(xn |µj , ϵI) = πk exp { − ∥xn − µk ∥2/2ϵ } K ∑ j=1 πj exp { − ∥xn − µj ∥2/2ϵ } (2.56) ͱͳΔɻ 56 / 67
  54. 2-3-2. k-means ͱͷؔ࿈ ͜͜Ͱɺ∥xn − µj ∥2 Λ࠷খʹ͢Δ j Λ

    j∗ ͱ͢Δɻ(·ͨɺ࠷খͰ͋Δ j ͸ j∗ །ҰͰ͋ΔͱԾఆ͢Δ) ͦͯ͠ɺ(2.56) ͷ࠶ӈลͷ෼฼෼ࢠʹ exp { ∥xn − µj∗ ∥2/2ϵ } Λ͔͚ Δͱɺ γk (xn ) = πk exp { − (∥xn − µk ∥2 − ∥xn − µj∗ ∥2)/2ϵ } K ∑ j=1 πj exp { − (∥xn − µj ∥2 − ∥xn − µj∗ ∥2)/2ϵ } (2.57) ͱͳΓɺk(̸= j∗) Ͱ͸ɺ−(∥xn − µk ∥2 − ∥xn − µj∗ ∥2) < 0 ΑΓɺ γk(̸=j∗) (xn ) → 0 (ϵ → 0) (2.58) ͱͳΔɻ 57 / 67
  55. 2-3-2. k-means ͱͷؔ࿈ Ұํɺk = j∗ Ͱ͸ɺ−(∥xn − µk ∥2

    − ∥xn − µj∗ ∥2) = 0 ͱͳΔɻ ·ͨɺ(2.57) ͷ෼฼ͷ࿨ ∑ K j=1 ͷதͰ j ̸= j∗ ͱͳΔ߲͸ ϵ → 0 Ͱ 0 ʹ ऩଋ͢ΔͷͰ γk=j∗ (xn ) → πj∗ πj∗ (ϵ → 0) = 1 (2.59) ͱͳΔɻ ͭ·Γɺ͜ͷۃݶͰ͸ෛ୲཰ γk (xn ) ͸ 2-1 Ͱಋೖͨ͠ 2 ஋ม਺ rnk ͱ Ұக͢Δɻ 58 / 67
  56. 2-3-2. k-means ͱͷؔ࿈ ·ͨɺ(2.54) ͸͜ͷۃݶͰ͸ Q(θ) = N ∑ n=1

    K ∑ k=1 rnk ln { πk N(xn |µk , ϵI) } = − 1 2 N ∑ n=1 K ∑ k=1 rnk ∥xn − µk ∥2 + const. (2.60) ͱͳΓɺ(2.1) ͷ࿪Έई౓͕ొ৔͠ɺQ(θ) Λ rnk ͱ µk Ͱ࠷େʹ͢Δ͜ ͱ͸ɺ(2.1) ͷ࿪Έई౓Λ rnk ͱ µk Ͱ࠷খʹ͢Δ͜ͱͱ౳ՁͰ͋Δ͜ ͱ͕Θ͔Δɻ 59 / 67
  57. 2-4. Ұൠͷ EM ΞϧΰϦζϜ 2-3 ͰҰൠతͳ EM ΞϧΰϦζϜΛಋೖͨ͠ͱ͖ɺҰൠతʹ؍ଌͰ͸જ ࡏม਺͸ಘΒΕͳ͍ͷͰɺ׬શσʔλͷର਺໬౓ؔ਺ͷજࡏม਺ͷࣄޙ ෼෍

    Q(θ, θold) Λ࠷େʹ͢ΔύϥϝʔλΛ൓෮తʹٻΊΔ͜ͱͰɺෆ׬ શσʔλͷ໬౓ؔ਺ p(X|θ) Λ࠷େʹͨ͠ɻ ͜͜Ͱ͸ɺQ(θ, θold) Λ࠷େʹ͢ΔύϥϝʔλΛ൓෮తʹٻΊΔ͜ͱͰ ෆ׬શσʔλͷ໬౓ؔ਺ p(X|θ) Λ࠷େʹग़དྷΔ͜ͱͷূ໌Λߦ͏ɻ 60 / 67
  58. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ·ͣɺZ ͷ೚ҙͷ֬཰෼෍ؔ਺ q(Z) Λಋೖ͢Δͱɺෆ׬શσʔλͷର ਺໬౓ؔ਺ ln

    p(X|θ) ͸ҎԼͷΑ͏ʹͳΔɻ ln p(X|θ) = { ∑ Z q(Z) } =1 ln p(X|θ) = ∑ Z q(Z) ln { p(X, Z|θ) p(Z|X, θ) } = ∑ Z q(Z) ln { p(X, Z|θ) q(Z) · q(Z) p(Z|X, θ) } = ∑ Z q(Z) ln { p(X, Z|θ) q(Z) } − ∑ Z q(Z) ln { p(Z|X, θ) q(Z) } (2.61) 61 / 67
  59. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ͜͜Ͱɺ৐๏ఆཧ p(X, Z|θ) = p(Z|X, θ)p(X|θ)

    (2.62) Λ༻͍ͨɻ ·ͨɺ L(q, θ) = ∑ Z q(Z) ln { p(X, Z|θ) q(Z) } (2.63) KL(q∥p) = − ∑ Z q(Z) ln { p(Z|X, θ) q(Z) } (2.64) ͱ͢Ε͹ɺ(2.61) ͸ ln p(X|θ) = L(q, θ) + KL(q∥p) (2.65) ͱͳΔɻ 62 / 67
  60. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ͜͜ͰɺPRML ͷ 1.6.1 ΑΓɺKL μΠόʔδΣϯε KL(q∥p)

    ͸ඇෛͰ ͋Δɻ(KL(q∥p) ≥ 0) ౳߸͸ q(Z) = p(Z|X, θ) ͷͱ͖ͷΈ੒ཱ͢Δɻ Ҏ্ΑΓɺln p(X|θ) ≥ L(q, θ) ͱͳΓɺL(q, θ) ͸ ln p(X|θ) ͷԼݶͰ͋ Δɻ(Լਤ) 63 / 67
  61. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ෼ղ (2.65) Λར༻ͯ͠ EM ΞϧΰϦζϜΛఆٛ͠ɺͦΕ͕ର਺໬౓ ln

    p(X|θ) Λ࠷େԽ͢Δ͜ͱΛࣔ͢ɻ ·ͣɺݱࡏͷύϥϝʔλͷ஋͕ θold Ͱ͋Δͱ͢Δɻ E εςοϓͰ͸ɺL(q, θold) Λ θold Λݻఆ͠ͳ͕Β q(Z) ʹ͍ͭͯ࠷େ Խ͢Δɻ ͜ͷ࠷େԽ͸؆୯Ͱɺର਺໬౓ ln p(X|θold) ͸ q(Z) ʹґଘ͠ͳ͍ͷͰɺ q(Z) ΛมԽͤͯ͞΋ ln p(X|θold) ͸มԽ͠ͳ͍ɻ ͜ΕΑΓɺKL(q∥p) = 0 ͭ·Γ q(Z) = p(Z|X, θold) ͷͱ͖ɺL(q, θold) ͸࠷େʹͳΔɻ ͜ͷͱ͖ɺԼਤͷΑ͏ʹ ln p(X|θold) = L(q, θold) ͱͳΔɻ 64 / 67
  62. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ࣍ͷ M εςοϓͰ͸ɺq(Z) = p(Z|X, θold)

    Ͱݻఆ͠ɺL(q, θ) Λ θ ʹͭ ͍ͯ࠷େԽ͢Δɻ L(q, θ) Λ࠷େʹ͢ΔύϥϝʔλΛ θnew ͱ͢Δͱɺ͢Ͱʹ L(q, θ) ͕࠷ େ஋ʹୡ͍ͯ͠ͳ͍ݶΓ (θnew ̸= θold ͷͱ͖)ɺL(q, θ) ͸૿Ճ͠ɺ͞Β ʹ KL μΠόʔδΣϯε͸ҎԼͷΑ͏ʹਖ਼ʹͳΔɻ KL(q∥p) = − ∑ Z p(Z|X, θold) ln { p(Z|X, θnew) p(Z|X, θold) } > 0 (2.66) 65 / 67
  63. 2-4. Ұൠͷ EM ΞϧΰϦζϜ ·ͨɺq(Z) = p(Z|X, θold) ͷͱ͖ͷ L(q,

    θ) ͸ L(q, θ) = ∑ Z p(Z|X, θold) ln { p(X, Z|θ) p(Z|X, θold) } = ∑ Z p(Z|X, θold) ln p(X, Z|θ) − ∑ Z p(Z|X, θold) ln p(Z|X, θold) =Q(θ, θold) + const. (2.67) ͜͜Ͱɺ(2.43) ͷ Q(θ, θold) Λ༻͍ͨɻ ͜͜Ͱͷ const. ͱ͸ɺθ ʹґଘ͠ͳ͍߲Ͱ͋Δɻ ͜ΕΑΓɺM εςοϓͷ L(q, θ) ͷ࠷େԽ͸ Q(θ, θold) ͷ࠷େԽͱ౳Ձ Ͱ͋Δɻ ͜Ε͕ 2-3 ͷ M εςοϓͰɺln p(X, Z|θ) ͷࣄޙ෼෍ p(Z|X, θ) ʹ͓͚ Δظ଴஋ Q(θ, θold) Λ࠷େԽ͢Δཧ༝Ͱ͋Δɻ 67 / 67