Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ベイズ推論による機械学習入門 4章前半
Search
Takahiro Kawashima
October 01, 2018
Science
670
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
ベイズ推論による機械学習入門 4章前半
某所での輪読用資料
須山敦志『ベイズ推論による機械学習入門』4.1節〜4.3節
Takahiro Kawashima
October 01, 2018
More Decks by Takahiro Kawashima
See All by Takahiro Kawashima
論文紹介:HalluCitation Matters
wasyro
0
97
引力・斥力を制御可能なランダム部分集合の確率分布
wasyro
0
400
集合間Bregmanダイバージェンスと置換不変NNによるその学習
wasyro
0
340
論文紹介:Precise Expressions for Random Projections
wasyro
1
620
ガウス過程入門
wasyro
0
1.1k
論文紹介:Inter-domain Gaussian Processes
wasyro
0
200
論文紹介:Proximity Variational Inference (近接性変分推論)
wasyro
0
390
機械学習のための行列式点過程:概説
wasyro
0
2.2k
SOLVE-GP: ガウス過程の新しいスパース変分推論法
wasyro
1
1.6k
Other Decks in Science
See All in Science
AkarengaLT vol.40
hashimoto_kei
0
110
AkarengaLT vol.41
hashimoto_kei
1
140
ダメな自分の育て方―性格タイプの「劣等機能」から理解するニガテ克服術
ppillc
0
150
AIを用いた PID制御で部屋 の温度制御をしてみた
nearme_tech
PRO
0
140
東北地方における過去20年間の降水量の変化
naokimuroki
1
250
チュートリアル:世界モデル
hf149
0
1.7k
人生を変えた一冊「独学大全」のはなし / Self-study ENCYCLOPEDIA: The Book Which Change My Life #独学大全 #EM推し本
expajp
0
160
HDC tutorial
michielstock
2
700
AI(人工知能)の過去・現在・未来 —AIは人間を超えるのか—
tagtag
PRO
0
120
NDCG is NOT All I Need
statditto
2
3.2k
不動産業界における業界特化のデータ整備とAI活用 ─Vertical DataとVertical AI─
estie
1
550
Amusing Abliteration
ianozsvald
1
200
Featured
See All Featured
Designing for Timeless Needs
cassininazir
1
250
Are puppies a ranking factor?
jonoalderson
1
3.5k
HDC tutorial
michielstock
2
700
Reality Check: Gamification 10 Years Later
codingconduct
0
2.2k
Java REST API Framework Comparison - PWX 2021
mraible
34
9.4k
Side Projects
sachag
455
43k
Docker and Python
trallard
47
3.9k
Everyday Curiosity
cassininazir
0
230
From Legacy to Launchpad: Building Startup-Ready Communities
dugsong
0
230
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
280
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Believing is Seeing
oripsolob
1
140
Transcript
ਢࢁຊ 4 ষલ ౡوେ October 1, 2018 ిؾ௨৴େֶ 4
࣍ 1. ࠞ߹Ϟσϧͱࣄޙͷਪ 2. ֬ͷۙࣅख๏ 3. ϙΞιϯࠞ߹Ϟσϧʹ͓͚Δਪ 2
ࠞ߹Ϟσϧͱࣄޙͷਪ
ࠞ߹Ϟσϧͷಈػ ෳͷͷ͋͠ΘͤͰΑΓෳࡶͳϞσϧΛ ˠࠞ߹Ϟσϧ ୯ҰͷΨεϞσϧͰઆ໌Ͱ͖ͳͦ͞͏ 3
ࠞ߹Ϟσϧͷσʔλੜաఔ Ϋϥελ K ط ੜσʔλ X = {x1, . .
. , xN } જࡏม (one-hot) S = {s1, . . . , sN } ࠞ߹ൺ π = (π1, . . . , πK)⊤ ֤Ϋϥελύϥϝʔλ Θ = (θ1, . . . , θK)⊤ 4
ࠞ߹Ϟσϧͷσʔλੜաఔ p(X, S, Θ, π) = p(X|S, Θ)p(S|π)p(Θ)p(π) = [
N ∏ n=1 p(xn|sn, Θ)p(sn|π) ] [ K ∏ k=1 p(θk) ] p(π) (4.5) sn ʹΧςΰϦΧϧɼͦͷύϥϝʔλ π ʹσΟϦΫϨͰ ڞࣄલ p(sn|π) = Cat(sn|π) (4.2) p(π) = Dir(π|α) (4.3) 5
ࠞ߹Ϟσϧͷࣄޙ ਪఆ͍ͨ͠ະมͷಉ࣌ࣄޙ p(S, Θ, π|X) = p(X, S, Θ, π)
p(X) (4.6) ͞ΒʹΫϥελΛਪఆ͢Δʹ p(S|X) = ∫∫ p(S, Θ, π|X)dΘdπ (4.7) ͷܭࢉ͕ඞཁ 6
ࠞ߹Ϟσϧͷࣄޙ ਖ਼نԽ߲ p(X) ΛཅʹಘΔʹ p(X) = ∑ S ∫∫ p(X,
S, Θ, π)dΘdπ = ∑ S p(X, S) (4.8) Λܭࢉ ੵڞࣄલΛ͑ղੳతʹධՁͰ͖Δ͕ʜʜ S ͷͯ͢ͷΈ߹Θͤʹର͢Δ͕ඞཁ ˠ MCMCɼมਪͳͲͰࣄޙΛۙࣅ 7
֬ͷۙࣅख๏
ΪϒεαϯϓϦϯά ѻ͍ͮΒ͍֬ p(z1, z2, z3) ͷ౷ܭྔΛಘ͍ͨ ˠ MCMC(Markov chain Monte
Carlo) Ͱ p(z1, z2, z3) ͔Βαϯϓ Ϧϯά ΪϒεαϯϓϦϯά ҎԼͷ full conditional ͔Β܁Γฦ͠αϯϓϦϯάͯ͠ p(z1, z2, z3) ͔ΒͷαϯϓϦϯάܥྻΛಘΔ z(i) 1 ∼ p(z1|z(i−1) 2 , z(i−1) 3 ) z(i) 2 ∼ p(z2|z(i) 1 , z(i−1) 3 ) (4.10) z(i) 3 ∼ p(z3|z(i) 1 , z(i) 2 ) 8
ΪϒεαϯϓϦϯά 2 ࣍ݩΨεʹରͯ͠ΪϒεαϯϓϦϯά (ਤ 4.4) ੨ઢɿਅͷɼઢɿαϯϓϧू߹͔Βಘͨۙࣅ 2 1 0 1
2 3 4 z1 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 z2 p(z) q(z) 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 z1 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 z2 p(z) q(z) มؒͷ૬͕ؔେ͖͍ͱո͘͠ͳΓ͕ͪ 9
ൃలख๏ 1ɿϒϩοΩϯάΪϒεαϯϓϦϯά ϒϩοΩϯάΪϒεαϯϓϦϯά z2, z3 ͷಉ࣌Λ༻͍ͯΪϒεαϯϓϦϯά z(i) 1 ∼ p(z1|z(i−1)
2 , z(i−1) 3 ) z(i) 2 , z(i) 3 ∼ p(z2, z3|z(i) 1 ) (4.11) • z2 ͱ z3 ͷ૬͕ؔڧͯ͘͏·͍͖͍͘͢ • p(z2, z3|z(i)) ͔ΒαϯϓϦϯά͍͢͠ඞཁ 10
ൃలख๏ 2ɿ่յܕΪϒεαϯϓϦϯά ่յܕΪϒεαϯϓϦϯά z3 ΛपลԽআڈޙɼp(z1, z2) ͔ΒΪϒεαϯϓϦϯά p(z1, z2) =
∫ p(z1, z2, z3)dz3 (4.12) z(i) 1 ∼ p(z1|z(i−1) 2 ) z(i) 2 ∼ p(z2|z(i) 1 ) (4.13) • ߴԽ͕ݟࠐΊΔ • पล͕ղੳతʹٻ·Δඞཁ • Γͷม͕αϯϓϦϯά͍͢͠ܗࣜͰ͋Δඞཁ 11
มਪ ֬ p(z1, z2, z3) Λѻ͍͍ۙ͢ࣅ q(z1, z2, z3) Ͱදݱ
ˠ KL ڑ࠷খԽ qopt.(z1, z2, z3) = arg min q KL[q(z1, z2, z3)∥p(z1, z2, z3)] (4.14) มਪ q ͷදݱೳྗΛݶఆͯ͠ KL ڑΛ࠷খԽ 12
มਪ ฏۉۙࣅ ֤֬มʹಠཱੑΛԾఆ p(z1, z2, z3) ≈ q(z1)q(z2)q(z3) (4.15) q(z1),
q(z2), q(z3) Λ KL ڑ͕খ͘͞ͳΔΑ͏ஞ࣍తʹमਖ਼ Notation ⟨·⟩q(z1)q(z2)q(z3) = ⟨·⟩1,2,3 13
มਪ q(z2), q(z3) Λॴ༩ͱͯ͠ q(z1) Λ࠷దԽ qopt.(z1) = arg min
q(z1) KL[q(z1)q(z2)q(z3)∥p(z1, z2, z3)] (4.16) KL[q(z1)q(z2)q(z3)∥p(z1, z2, z3)] = − ⟨ ln p(z1, z2, z3) q(z1)q(z2)q(z3) ⟩ 1,2,3 (4.18) = − ⟨⟨ ln p(z1, z2, z3) q(z1)q(z2)q(z3) ⟩ 2,3 ⟩ 1 (4.19) = − ⟨ ⟨ln p(z1, z2, z3)⟩2,3 − ⟨ln q(z1)⟩2,3 − ⟨ln q(z2)⟩2,3 − ⟨ln q(z3)⟩2,3 ⟩ 1 (4.20) 14
มਪ ⟨ln q(z1)⟩2,3 = ln q(z1)ɼq(z1) ͱແؔͳ෦Λఆʹཧ = − ⟨⟨ln
p(z1, z2, z3)⟩2,3 − ln q(z1)⟩ 1 + const. (4.21) = − ⟨ln [exp(⟨ln p(z1, z2, z3)⟩2,3)] − ln q(z1)⟩ 1 + const. = − ⟨ ln exp(⟨ln p(z1, z2, z3)⟩2,3) ln q(z1) ⟩ 1 + const. (4.22) = KL[q(z1)∥exp{⟨ln p(z1, z2, z3)⟩2,3}] + const. (4.23) ࠷ऴతʹࣜ (4.23) ͷ࠷খ ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. (4.24) ͰಘΒΕΔ (q(z2), q(z3) ʹ͍ͭͯಉ༷) 15
มਪ ฏۉۙࣅʹΑΔมਪ (ΞϧΰϦζϜ 4.1) q(z2), q(z3) ΛॳظԽ for i =
1, . . . , max iter do ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. ln q(z2) = ⟨ln p(z1, z2, z3)⟩q(z1)q(z3) + const. ln q(z3) = ⟨ln p(z1, z2, z3)⟩q(z1)q(z2) + const. end for ͏ͪΐ ͬͱ͔͍͜͠ऴྃ݅Λઃఆ͍ͨ͠ ˠͨͱ͑ ELBO(evidence lower bound) ΛධՁج४ʹ 16
มਪ ELBO(A.4, p.233) มਪʮपลͷԼݶʯͷ࠷େԽख๏ͱͯ͠ଊ͑ΒΕΔ Xɿ؍ଌσʔλɼZɿະ؍ଌม Z ∼ q(Z) ΛԾఆ ln
p(X) = ln ∫ p(X, Z)dZ = ln ∫ q(Z) p(X, Z) q(Z) dZ ≥ ∫ q(Z)ln p(X, Z) q(Z) dZ (Jensen ͷෆࣜ) =: L[q(Z)] (A.39) 17
มਪ ࢀߟɿJensen ͷෆࣜ ҙͷ “্ʹ” ತͳؔ fɼҙͷ֬ີؔ p ʹؔͯ͠ f
(∫ y(x)p(x)dx ) ≥ ∫ f(y(x))p(x)dx (A.40) 18
มਪ ELBO(A.4, p.233) पลͷԼݶ L[q(Z)] Λ q(Z) ͷ ELBO ͱΑͿ
ରपลͱ ELBO ͱͷࠩ q(Z) ͱ p(Z|X) ͱͷ KL ڑʹ ͍͠ KL[q(Z)∥p(Z|X)] = ∫ q(Z)ln q(Z) p(Z|X) dZ = ∫ q(Z)ln q(Z)p(X) p(X, Z) dZ = p(X) − ∫ q(Z)ln p(X, Z) q(Z) dZ = p(X) − L[q(Z)] (A.41) 19
มਪ ELBO(A.4, p.233) KL[q(Z)∥p(Z|X)] = p(X) − L[q(Z)] (A.41) ln
p(X) σʔλͱϞσϧॴ༩ͷͱఆ ˠ q(Z) ʹؔ͢Δ KL ڑ࠷খԽͱରपลͷԼݶ L[q(Z)] ͷ ࠷େԽՁ ELBO ͷมԽ͕ఆ ϵ ΑΓখ͘͞ͳͬͨͱ͖ʹมਪΞϧΰ ϦζϜΛࢭΊΔ 20
มਪ ߏԽมਪ ਅͷΛ෦తʹۙࣅؔʹղ p(z1, z2, z3) ≈ q(z1)q(z2, z3) (4.26)
21
มਪ (؆қ࣮ݧ) 2 ࣍ݩΨεʹมਪΛద༻ (ਤ 4.5) 1.0 0.5 0.0 0.5
0.50 0.25 0.00 0.25 0.50 1 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 2 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 3 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 4 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 5 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 6 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 7 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 8 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 9 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 10 of 10 ੨ઢɿਅͷ ઢɿۙࣅࣄޙ 22
มਪ (؆қ࣮ݧ) 2 ࣍ݩΨεʹมਪΛద༻ (ਤ 4.5) 2 4 6 8
10 iteration 0.46 0.48 0.50 0.52 0.54 KL divergence KL ڑ୯ௐݮগ 23
มਪ (؆қ࣮ݧ) 2 ࣍ݩΨεʹมਪΛద༻ (ਤ 4.5) • ͍ • ΠςϨʔγϣϯ͝ͱʹ
KL ڑ͕୯ௐݮগ • ڧ͍૬ؔΛଊ͑ΒΕͳ͍ 24
ϙΞιϯࠞ߹Ϟσϧʹ͓͚Δਪ
ϙΞιϯࠞ߹Ϟσϧ 1 ࣍ݩࢄඇෛσʔλͷΫϥελΛਪఆ (ਤ 4.6) 80 100 120 140 160
180 0 20 40 60 80 100 120 observation 25
ϙΞιϯࠞ߹Ϟσϧ p(xn|λk) = Poi(xn|λk) (4.27) ΑΓ p(xn|sn, λ) = K
∏ k=1 Poi(xn|λk)sn,k (4.28) λk ͷڞࣄલ p(λk) = Gamma(λk|a, b) (4.29) 26
ΪϒεαϯϓϦϯά ࠞ߹ͰજࡏมͱύϥϝʔλΛ͚ͯαϯϓϧ͢ΔͱΑ͍ S ∼ p(S|X, λ, π) (4.31) λ, π
∼ p(λ, π|X, S) (4.32) ม S ͷΈʹண p(S|X, λ, π) ∝ p(X|S, λ)p(S|π) = N ∏ n=1 p(xn|sn, λ)p(sn|π) (4.33) 27
ΪϒεαϯϓϦϯά p(xn|sn, λ), p(sn|π) ΛͦΕͧΕܭࢉ͢Δͱɼ࠷ऴతʹ sn ∼ Cat(sn|ηn ) (4.37)
ͨͩ͠ ηn,k ∼ exp{xnln λk − λk + ln πk} ( s.t. K ∑ k=1 ηn,k = 1 ) (4.38) ͕ಘΒΕΔ 28
ΪϒεαϯϓϦϯά p(λ, π|X, S) ∝ p(X, S, λ, π) =
p(X|S, λ)p(S|π)p(λ)p(π) (4.39) ˠ λ ͱ π ͷࣄޙಠཱ λ ʹؔͷ͋Δͱ͜Ζʹ͚ͩ p(λ|X, S) ∝ p(X|S, λ)p(λ) 29
ΪϒεαϯϓϦϯά ۩ମతʹܭࢉ͍ͯ͘͠ͱ λk ∼ Gam(λk|ˆ ak,ˆ bk) (4.41) ͨͩ͠ ˆ
ak = N ∑ n=1 sn,kxn + a ˆ bk = N ∑ n=1 sn,k + b (4.42) ͱͳΔ 30
ΪϒεαϯϓϦϯά π ʹؔͷ͋Δͱ͜Ζʹ͚ͩ p(π|X, S) ∝ p(S|π)p(π) ࠷ऴతʹ π ∼
Dir(π|ˆ α) (4.44) ͨͩ͠ ˆ αk = N ∑ n=1 sn,k + αk (4.45) 31
มਪ જࡏมͱύϥϝʔλʹղ (มϕΠζ EM ΞϧΰϦζϜ) p(S, λ, π|X) ≈ q(S)q(λ,
π) (4.46) มਪͷެࣜ ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. (4.24) Λ༻͍Δͱ q(S) ʹؔͯ͠ ln q(S) = ⟨ln p(X, S, λ, π)⟩q(λ,π) + const. = ⟨ln p(X|S, λ)p(S|π)p(λ)p(π)⟩q(λ,π) + const. = ⟨ln p(X|S, λ)⟩q(λ) + ⟨ln p(S|π)⟩q(π) + const. = [ N ∑ n=1 ⟨ln p(xn|sn, λ)⟩q(λ) + ⟨ln p(sn|π)⟩q(π) ] + const. (4.47) 32
มਪ (4.47) ࣜ૯ͷୈ 1 ߲ ⟨ln p(xn|sn, λ)⟩q(λ) = K
∑ k=1 ⟨sn,k ln Poi(xn|λk)⟩qk = K ∑ k=1 sn,k(xn⟨ln λk⟩ − ⟨λk⟩) + const. (4.48) ୈ 2 ߲ ⟨ln p(sn|π)⟩q(π) = ⟨ln Cat(sn|π)⟩q(π) = K ∑ k=1 sn,k⟨ln πk⟩ (4.49) 33
มਪ ࣜ (4.47),(4.48),(4.49) ͔Β ln q(sn) = ⟨ln p(xn|sn, λ)⟩q(λ)
+ ⟨ln p(sn|π)⟩q(π) + const. = K ∑ k=1 sn,k(xn⟨ln λk⟩ − ⟨λk⟩ + ⟨ln πk⟩ + const.) ͜͜Ͱ ln Cat(s|π) = ∑ K k=1 sn,k ln πk ΑΓ q(sn) = Cat(sn|ηn ) (4.50) ͨͩ͠ ηn,k ∝ exp{xn⟨ln λk⟩ − ⟨λk⟩ + ⟨ln πk⟩} ( s.t. K ∑ k=1 ηn,k = 1 ) (4.51) λ, π ͷظܭࢉҰ୴͋ͱ·Θ͠ 34
มਪ ଓ͍ͯύϥϝʔλͷۙࣅ ln q(λ, π) = ⟨ln p(X, S, λ,
π)⟩q(S) + const. = ⟨ln p(X|S, λ)⟩q(S) + ln p(λ) + ⟨ln p(S|π)⟩q(S) + ln p(π) + const. ΑΓɼλ, π ͕ಠཱʹղ͞Ε͍ͯΔ͜ͱ͕Θ͔Δ ˠ q(λ, π) ͷΘΓʹ q(λ), q(π) ΛͦΕͧΕٻΊΕΑ͍ 35
มਪ q(sn) ͷͱ͖ͱಉ༷ʹܭࢉ͍ͯ͘͠ͱɼ݁Ռͱͯ͠ q(λk) = Gam(λk|ˆ ak,ˆ bk) (4.54) ͨͩ͠
ˆ ak = N ∑ n=1 ⟨sn,k⟩xn + a ˆ bk = N ∑ n=1 ⟨sn,k⟩ + b (4.55) ͓Αͼ q(π) = Dir(π|ˆ α) (4.56) ͨͩ͠ ˆ αk = N ∑ n=1 ⟨sn,k⟩ + αk (4.57) ͕ಘΒΕΔ 36
มਪ ࣜ (4.57) ͷظ ⟨sn,k⟩ = ⟨sn,k⟩q(S) ɼ q(sn) =
Cat(sn|ηn ) (4.50) ΑΓɼ ⟨sn,k⟩q(S) = ηn,k 37
มਪ q(λk) = Gam(λk|ˆ ak,ˆ bk), q(π) = Dir(π|ˆ α)
͕Θ͔ͬͨͷͰɼ ͋ͱ·Θ͠ʹ͍ͯͨ͠ q(sn) ͷظ ⟨λ⟩, ⟨ln λ⟩, ⟨ln π⟩ Λܭࢉ ͜͜Ͱ Eλ∼Gam(λ|a,b) [λ] = a b (2.59) Eλ∼Gam(λ|a,b) [ln λ] = ψ(a) − ln b (2.60) Eπ∼Dir(π|α) [ln πk] = ψ(αk) − ψ ( K ∑ l=1 αk ) (2.52) ψ(x) σΟΨϯϚؔ ψ(x) = d dx ln Γ(x) (A.26) 38
มਪ ࣜ (2.59), (2.60), (2.52) Λ༻͍ΔͱɼٻΊ͍ͨظ ⟨λk⟩ = ˆ ak
ˆ bk (4.60) ⟨ln λk⟩ = ψ(ˆ ak) − ln ˆ bk (4.61) ⟨πk⟩ = ψ(ˆ αk) − ψ ( K ∑ l=1 ˆ αk ) (4.62) ͱಘΒΕΔ 39
่յܕΪϒεαϯϓϦϯά ࠞ߹Ϟσϧͷ่յܕΪϒεαϯϓϦϯάͰಉ͔࣌Βύϥ ϝʔλΛपลԽআڈ p(X, S) = ∫∫ p(X, S, λ,
π)dλdπ (4.63) ͋ͱ p(S|X) ͔ΒαϯϓϦϯάͰ͖ΕΑ͍͕ʜʜ 40
่յܕΪϒεαϯϓϦϯά पลԽલޙͷάϥϑΟΧϧϞσϧ (ਤ 4.7) sn ͕΄͔ͷશͯͷ S ͷཁૉͱґଘؔ (શάϥϑ) 41
่յܕΪϒεαϯϓϦϯά p(S|X) = p(X|S)p(S) ∑ S p(X|S)p(S) ΑΓɼp(S|X) ͔ΒαϯϓϦϯά͢ΔʹɼؔͷධՁ ʹ
KN ճͷܭࢉ͕ඞཁ ˠ S ͷ֤ཁૉʹΪϒεαϯϓϦϯάΛద༻ p(sn|X, S\n ) ∝ p(xn, X\n , sn, S\n ) (4.64) = p(xn|X\n , sn, S\n )p(X\n |sn, S\n ) × p(sn|S\n )p(S\n ) (4.65) ∝ p(xn|X\n , sn, S\n )p(sn|S\n ) (4.66) 42
่յܕΪϒεαϯϓϦϯά (4.66) ࣜӈଆ p(sn|S\n ) = ∫ p(sn|π)p(π|S\n )dπ (4.70)
= Cat(sn|η\n ) (4.74) η\n,k ∝ ∑ n′̸=n sn′,k + αk (4.75) α ࣄલ p(π) = Dir(π|α) ͷύϥϝʔλ 43
่յܕΪϒεαϯϓϦϯά (4.66) ࣜࠨଆ p(xn|X\n , sn, S\n ) = ∫
p(xn|sn, λ)p(λ|X\n , S\n )dλ (4.76) ͜Ε sn,k = 1 Ͱ͚݅Δͱղੳతʹ࣮ߦͰ͖ͯ p(xn|X\n , sn,k = 1, S\n ) = NB ( xn ˆ a\n,k , 1 ˆ b\n,k + 1 ) (4.81) ˆ a\n,k = ∑ n′̸=n sn′,kxn′ + ak (4.80) ˆ b\n,k = ∑ n′̸=n sn′,k + bk (4.81) ak, bk ࣄલ p(λk) = Gam(λk|ak, bk) ͷύϥϝʔλ 44
่յܕΪϒεαϯϓϦϯά ۩ମతͳ p(sn|S\n ) ͔ΒͷαϯϓϦϯάखॱ 1. sn ͷ࣮ݱͱͯ͠ (1, 0,
. . . , 0)⊤ ͔Β (0, 0, . . . , 1)⊤ Λ༻ҙ 2. ͦΕͧΕʹରͯ͠ p(sn|S\n ) = Cat(sn|η\n ) (4.74) p(xn|X\n , sn,k = 1, S\n ) = NB ( xn ˆ a\n,k , 1 ˆ b\n,k + 1 ) (4.81) ΛධՁ 3. ͜ͷ K ݸͷΛਖ਼نԽ͢Δͱɼp(sn|X) Λࣔ͢ΧςΰϦΧ ϧ͕ಘΒΕΔ 4. ಘΒΕͨ p(sn|X) ͔ΒαϯϓϦϯά 45
؆қ࣮ݧ 1 ࣍ݩࢄඇෛσʔλͷΫϥελਪఆ݁Ռ (มਪ) 80 100 120 140 160 180
0 20 40 60 80 100 120 observation 80 100 120 140 160 180 0 20 40 60 80 100 120 estimation ͱ੨ͷ 2 Ϋϥελʹ Ϋϥελॴଐ֬Λதؒ৭Ͱදݱ 46
؆қ࣮ݧ ELBO ͷऩଋ࣌ؒ (ਤ 4.10) ॎ࣠ɿELBOɼԣ࣠ (ର)ɿܭࢉ࣌ؒ [µs] 10 5
10 4 10 3 computation time( s) 5400 5200 5000 4800 4600 4400 ELBO VI GS CGS ؆୯ͳͳͷͰ࠷ऴతͳਫ਼ʹ͕ࠩͳ͍ 47
؆қ࣮ݧ େ·͔ͳͱͯ͠ • ͍ͷมਪ • ࠷ऴతʹਫ਼͕ྑ͍ͷ่յܕ GS • ่յܕ GS
ΠςϨʔγϣϯॳظ͔Βߴਫ਼ ΦεεϝɿͱΓ͋͑ͣ GS Λࢼ͠ɼਫ਼ʹೲಘ͕͍͔ͳ͚ Εมਪɾ่յܕ GS ಋग़ͯ͠ΈΔ 48
·ͱΊ • ࣄޙͷۙࣅख๏ͱͯ͠ΪϒεαϯϓϦϯάɾϒϩοΩϯ άΪϒεαϯϓϦϯάɾ่յܕΪϒεαϯϓϦϯάɾมਪ Λհ • ϙΞιϯࠞ߹Ϟσϧʹରͯ͠ΪϒεαϯϓϦϯάɾ่յܕΪ ϒεαϯϓϦϯάɾมਪΛ۩ମతʹಋग़ • ܭࢉ͕͍࣌ؒͷมਪɼਫ਼͕ྑ͍ͷ่յܕΪϒε
αϯϓϦϯάɼಋग़ָ͕ͳͷΪϒεαϯϓϦϯά 49