Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ベイズ推論による機械学習入門 4章前半
Search
Takahiro Kawashima
October 01, 2018
Science
0
550
ベイズ推論による機械学習入門 4章前半
某所での輪読用資料
須山敦志『ベイズ推論による機械学習入門』4.1節〜4.3節
Takahiro Kawashima
October 01, 2018
Tweet
Share
More Decks by Takahiro Kawashima
See All by Takahiro Kawashima
論文紹介:Precise Expressions for Random Projections
wasyro
0
230
ガウス過程入門
wasyro
0
270
論文紹介:Inter-domain Gaussian Processes
wasyro
0
140
論文紹介:Proximity Variational Inference (近接性変分推論)
wasyro
0
280
機械学習のための行列式点過程:概説
wasyro
0
1.3k
SOLVE-GP: ガウス過程の新しいスパース変分推論法
wasyro
1
1k
論文紹介:Stein Variational Gradient Descent
wasyro
0
940
次元削減(主成分分析・線形判別分析・カーネル主成分分析)
wasyro
0
670
論文紹介: Supervised Principal Component Analysis
wasyro
1
770
Other Decks in Science
See All in Science
Mechanistic Interpretability の紹介
sohtakahashi
0
350
Pericarditis Comic
camkdraws
0
1.2k
多次元展開法を用いた 多値バイクラスタリング モデルの提案
kosugitti
0
190
LIMEを用いた判断根拠の可視化
kentaitakura
0
340
白金鉱業Meetup Vol.15 DMLによる条件付処置効果の推定_sotaroIZUMI_20240919
brainpadpr
1
490
(2024) Livres, Femmes et Math
mansuy
0
110
Sarcoptic Mange
uni_of_nomi
1
110
240510 COGNAC LabChat
kazh
0
130
ultraArmをモニター提供してもらった話
miura55
0
190
Celebrate UTIG: Staff and Student Awards 2024
utig
0
460
機械学習を支える連続最適化
nearme_tech
PRO
1
150
証明支援系LEANに入門しよう
unaoya
0
350
Featured
See All Featured
Ruby is Unlike a Banana
tanoku
97
11k
Designing on Purpose - Digital PM Summit 2013
jponch
115
7k
A better future with KSS
kneath
238
17k
Being A Developer After 40
akosma
87
590k
Six Lessons from altMBA
skipperchong
27
3.5k
A Philosophy of Restraint
colly
203
16k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
246
1.3M
Large-scale JavaScript Application Architecture
addyosmani
510
110k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
27
4.3k
Building Better People: How to give real-time feedback that sticks.
wjessup
364
19k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
6
420
Art, The Web, and Tiny UX
lynnandtonic
297
20k
Transcript
ਢࢁຊ 4 ষલ ౡوେ October 1, 2018 ిؾ௨৴େֶ 4
࣍ 1. ࠞ߹Ϟσϧͱࣄޙͷਪ 2. ֬ͷۙࣅख๏ 3. ϙΞιϯࠞ߹Ϟσϧʹ͓͚Δਪ 2
ࠞ߹Ϟσϧͱࣄޙͷਪ
ࠞ߹Ϟσϧͷಈػ ෳͷͷ͋͠ΘͤͰΑΓෳࡶͳϞσϧΛ ˠࠞ߹Ϟσϧ ୯ҰͷΨεϞσϧͰઆ໌Ͱ͖ͳͦ͞͏ 3
ࠞ߹Ϟσϧͷσʔλੜաఔ Ϋϥελ K ط ੜσʔλ X = {x1, . .
. , xN } જࡏม (one-hot) S = {s1, . . . , sN } ࠞ߹ൺ π = (π1, . . . , πK)⊤ ֤Ϋϥελύϥϝʔλ Θ = (θ1, . . . , θK)⊤ 4
ࠞ߹Ϟσϧͷσʔλੜաఔ p(X, S, Θ, π) = p(X|S, Θ)p(S|π)p(Θ)p(π) = [
N ∏ n=1 p(xn|sn, Θ)p(sn|π) ] [ K ∏ k=1 p(θk) ] p(π) (4.5) sn ʹΧςΰϦΧϧɼͦͷύϥϝʔλ π ʹσΟϦΫϨͰ ڞࣄલ p(sn|π) = Cat(sn|π) (4.2) p(π) = Dir(π|α) (4.3) 5
ࠞ߹Ϟσϧͷࣄޙ ਪఆ͍ͨ͠ະมͷಉ࣌ࣄޙ p(S, Θ, π|X) = p(X, S, Θ, π)
p(X) (4.6) ͞ΒʹΫϥελΛਪఆ͢Δʹ p(S|X) = ∫∫ p(S, Θ, π|X)dΘdπ (4.7) ͷܭࢉ͕ඞཁ 6
ࠞ߹Ϟσϧͷࣄޙ ਖ਼نԽ߲ p(X) ΛཅʹಘΔʹ p(X) = ∑ S ∫∫ p(X,
S, Θ, π)dΘdπ = ∑ S p(X, S) (4.8) Λܭࢉ ੵڞࣄલΛ͑ղੳతʹධՁͰ͖Δ͕ʜʜ S ͷͯ͢ͷΈ߹Θͤʹର͢Δ͕ඞཁ ˠ MCMCɼมਪͳͲͰࣄޙΛۙࣅ 7
֬ͷۙࣅख๏
ΪϒεαϯϓϦϯά ѻ͍ͮΒ͍֬ p(z1, z2, z3) ͷ౷ܭྔΛಘ͍ͨ ˠ MCMC(Markov chain Monte
Carlo) Ͱ p(z1, z2, z3) ͔Βαϯϓ Ϧϯά ΪϒεαϯϓϦϯά ҎԼͷ full conditional ͔Β܁Γฦ͠αϯϓϦϯάͯ͠ p(z1, z2, z3) ͔ΒͷαϯϓϦϯάܥྻΛಘΔ z(i) 1 ∼ p(z1|z(i−1) 2 , z(i−1) 3 ) z(i) 2 ∼ p(z2|z(i) 1 , z(i−1) 3 ) (4.10) z(i) 3 ∼ p(z3|z(i) 1 , z(i) 2 ) 8
ΪϒεαϯϓϦϯά 2 ࣍ݩΨεʹରͯ͠ΪϒεαϯϓϦϯά (ਤ 4.4) ੨ઢɿਅͷɼઢɿαϯϓϧू߹͔Βಘͨۙࣅ 2 1 0 1
2 3 4 z1 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 z2 p(z) q(z) 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 z1 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 z2 p(z) q(z) มؒͷ૬͕ؔେ͖͍ͱո͘͠ͳΓ͕ͪ 9
ൃలख๏ 1ɿϒϩοΩϯάΪϒεαϯϓϦϯά ϒϩοΩϯάΪϒεαϯϓϦϯά z2, z3 ͷಉ࣌Λ༻͍ͯΪϒεαϯϓϦϯά z(i) 1 ∼ p(z1|z(i−1)
2 , z(i−1) 3 ) z(i) 2 , z(i) 3 ∼ p(z2, z3|z(i) 1 ) (4.11) • z2 ͱ z3 ͷ૬͕ؔڧͯ͘͏·͍͖͍͘͢ • p(z2, z3|z(i)) ͔ΒαϯϓϦϯά͍͢͠ඞཁ 10
ൃలख๏ 2ɿ่յܕΪϒεαϯϓϦϯά ่յܕΪϒεαϯϓϦϯά z3 ΛपลԽআڈޙɼp(z1, z2) ͔ΒΪϒεαϯϓϦϯά p(z1, z2) =
∫ p(z1, z2, z3)dz3 (4.12) z(i) 1 ∼ p(z1|z(i−1) 2 ) z(i) 2 ∼ p(z2|z(i) 1 ) (4.13) • ߴԽ͕ݟࠐΊΔ • पล͕ղੳతʹٻ·Δඞཁ • Γͷม͕αϯϓϦϯά͍͢͠ܗࣜͰ͋Δඞཁ 11
มਪ ֬ p(z1, z2, z3) Λѻ͍͍ۙ͢ࣅ q(z1, z2, z3) Ͱදݱ
ˠ KL ڑ࠷খԽ qopt.(z1, z2, z3) = arg min q KL[q(z1, z2, z3)∥p(z1, z2, z3)] (4.14) มਪ q ͷදݱೳྗΛݶఆͯ͠ KL ڑΛ࠷খԽ 12
มਪ ฏۉۙࣅ ֤֬มʹಠཱੑΛԾఆ p(z1, z2, z3) ≈ q(z1)q(z2)q(z3) (4.15) q(z1),
q(z2), q(z3) Λ KL ڑ͕খ͘͞ͳΔΑ͏ஞ࣍తʹमਖ਼ Notation ⟨·⟩q(z1)q(z2)q(z3) = ⟨·⟩1,2,3 13
มਪ q(z2), q(z3) Λॴ༩ͱͯ͠ q(z1) Λ࠷దԽ qopt.(z1) = arg min
q(z1) KL[q(z1)q(z2)q(z3)∥p(z1, z2, z3)] (4.16) KL[q(z1)q(z2)q(z3)∥p(z1, z2, z3)] = − ⟨ ln p(z1, z2, z3) q(z1)q(z2)q(z3) ⟩ 1,2,3 (4.18) = − ⟨⟨ ln p(z1, z2, z3) q(z1)q(z2)q(z3) ⟩ 2,3 ⟩ 1 (4.19) = − ⟨ ⟨ln p(z1, z2, z3)⟩2,3 − ⟨ln q(z1)⟩2,3 − ⟨ln q(z2)⟩2,3 − ⟨ln q(z3)⟩2,3 ⟩ 1 (4.20) 14
มਪ ⟨ln q(z1)⟩2,3 = ln q(z1)ɼq(z1) ͱແؔͳ෦Λఆʹཧ = − ⟨⟨ln
p(z1, z2, z3)⟩2,3 − ln q(z1)⟩ 1 + const. (4.21) = − ⟨ln [exp(⟨ln p(z1, z2, z3)⟩2,3)] − ln q(z1)⟩ 1 + const. = − ⟨ ln exp(⟨ln p(z1, z2, z3)⟩2,3) ln q(z1) ⟩ 1 + const. (4.22) = KL[q(z1)∥exp{⟨ln p(z1, z2, z3)⟩2,3}] + const. (4.23) ࠷ऴతʹࣜ (4.23) ͷ࠷খ ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. (4.24) ͰಘΒΕΔ (q(z2), q(z3) ʹ͍ͭͯಉ༷) 15
มਪ ฏۉۙࣅʹΑΔมਪ (ΞϧΰϦζϜ 4.1) q(z2), q(z3) ΛॳظԽ for i =
1, . . . , max iter do ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. ln q(z2) = ⟨ln p(z1, z2, z3)⟩q(z1)q(z3) + const. ln q(z3) = ⟨ln p(z1, z2, z3)⟩q(z1)q(z2) + const. end for ͏ͪΐ ͬͱ͔͍͜͠ऴྃ݅Λઃఆ͍ͨ͠ ˠͨͱ͑ ELBO(evidence lower bound) ΛධՁج४ʹ 16
มਪ ELBO(A.4, p.233) มਪʮपลͷԼݶʯͷ࠷େԽख๏ͱͯ͠ଊ͑ΒΕΔ Xɿ؍ଌσʔλɼZɿະ؍ଌม Z ∼ q(Z) ΛԾఆ ln
p(X) = ln ∫ p(X, Z)dZ = ln ∫ q(Z) p(X, Z) q(Z) dZ ≥ ∫ q(Z)ln p(X, Z) q(Z) dZ (Jensen ͷෆࣜ) =: L[q(Z)] (A.39) 17
มਪ ࢀߟɿJensen ͷෆࣜ ҙͷ “্ʹ” ತͳؔ fɼҙͷ֬ີؔ p ʹؔͯ͠ f
(∫ y(x)p(x)dx ) ≥ ∫ f(y(x))p(x)dx (A.40) 18
มਪ ELBO(A.4, p.233) पลͷԼݶ L[q(Z)] Λ q(Z) ͷ ELBO ͱΑͿ
ରपลͱ ELBO ͱͷࠩ q(Z) ͱ p(Z|X) ͱͷ KL ڑʹ ͍͠ KL[q(Z)∥p(Z|X)] = ∫ q(Z)ln q(Z) p(Z|X) dZ = ∫ q(Z)ln q(Z)p(X) p(X, Z) dZ = p(X) − ∫ q(Z)ln p(X, Z) q(Z) dZ = p(X) − L[q(Z)] (A.41) 19
มਪ ELBO(A.4, p.233) KL[q(Z)∥p(Z|X)] = p(X) − L[q(Z)] (A.41) ln
p(X) σʔλͱϞσϧॴ༩ͷͱఆ ˠ q(Z) ʹؔ͢Δ KL ڑ࠷খԽͱରपลͷԼݶ L[q(Z)] ͷ ࠷େԽՁ ELBO ͷมԽ͕ఆ ϵ ΑΓখ͘͞ͳͬͨͱ͖ʹมਪΞϧΰ ϦζϜΛࢭΊΔ 20
มਪ ߏԽมਪ ਅͷΛ෦తʹۙࣅؔʹղ p(z1, z2, z3) ≈ q(z1)q(z2, z3) (4.26)
21
มਪ (؆қ࣮ݧ) 2 ࣍ݩΨεʹมਪΛద༻ (ਤ 4.5) 1.0 0.5 0.0 0.5
0.50 0.25 0.00 0.25 0.50 1 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 2 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 3 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 4 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 5 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 6 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 7 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 8 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 9 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 10 of 10 ੨ઢɿਅͷ ઢɿۙࣅࣄޙ 22
มਪ (؆қ࣮ݧ) 2 ࣍ݩΨεʹมਪΛద༻ (ਤ 4.5) 2 4 6 8
10 iteration 0.46 0.48 0.50 0.52 0.54 KL divergence KL ڑ୯ௐݮগ 23
มਪ (؆қ࣮ݧ) 2 ࣍ݩΨεʹมਪΛద༻ (ਤ 4.5) • ͍ • ΠςϨʔγϣϯ͝ͱʹ
KL ڑ͕୯ௐݮগ • ڧ͍૬ؔΛଊ͑ΒΕͳ͍ 24
ϙΞιϯࠞ߹Ϟσϧʹ͓͚Δਪ
ϙΞιϯࠞ߹Ϟσϧ 1 ࣍ݩࢄඇෛσʔλͷΫϥελΛਪఆ (ਤ 4.6) 80 100 120 140 160
180 0 20 40 60 80 100 120 observation 25
ϙΞιϯࠞ߹Ϟσϧ p(xn|λk) = Poi(xn|λk) (4.27) ΑΓ p(xn|sn, λ) = K
∏ k=1 Poi(xn|λk)sn,k (4.28) λk ͷڞࣄલ p(λk) = Gamma(λk|a, b) (4.29) 26
ΪϒεαϯϓϦϯά ࠞ߹ͰજࡏมͱύϥϝʔλΛ͚ͯαϯϓϧ͢ΔͱΑ͍ S ∼ p(S|X, λ, π) (4.31) λ, π
∼ p(λ, π|X, S) (4.32) ม S ͷΈʹண p(S|X, λ, π) ∝ p(X|S, λ)p(S|π) = N ∏ n=1 p(xn|sn, λ)p(sn|π) (4.33) 27
ΪϒεαϯϓϦϯά p(xn|sn, λ), p(sn|π) ΛͦΕͧΕܭࢉ͢Δͱɼ࠷ऴతʹ sn ∼ Cat(sn|ηn ) (4.37)
ͨͩ͠ ηn,k ∼ exp{xnln λk − λk + ln πk} ( s.t. K ∑ k=1 ηn,k = 1 ) (4.38) ͕ಘΒΕΔ 28
ΪϒεαϯϓϦϯά p(λ, π|X, S) ∝ p(X, S, λ, π) =
p(X|S, λ)p(S|π)p(λ)p(π) (4.39) ˠ λ ͱ π ͷࣄޙಠཱ λ ʹؔͷ͋Δͱ͜Ζʹ͚ͩ p(λ|X, S) ∝ p(X|S, λ)p(λ) 29
ΪϒεαϯϓϦϯά ۩ମతʹܭࢉ͍ͯ͘͠ͱ λk ∼ Gam(λk|ˆ ak,ˆ bk) (4.41) ͨͩ͠ ˆ
ak = N ∑ n=1 sn,kxn + a ˆ bk = N ∑ n=1 sn,k + b (4.42) ͱͳΔ 30
ΪϒεαϯϓϦϯά π ʹؔͷ͋Δͱ͜Ζʹ͚ͩ p(π|X, S) ∝ p(S|π)p(π) ࠷ऴతʹ π ∼
Dir(π|ˆ α) (4.44) ͨͩ͠ ˆ αk = N ∑ n=1 sn,k + αk (4.45) 31
มਪ જࡏมͱύϥϝʔλʹղ (มϕΠζ EM ΞϧΰϦζϜ) p(S, λ, π|X) ≈ q(S)q(λ,
π) (4.46) มਪͷެࣜ ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. (4.24) Λ༻͍Δͱ q(S) ʹؔͯ͠ ln q(S) = ⟨ln p(X, S, λ, π)⟩q(λ,π) + const. = ⟨ln p(X|S, λ)p(S|π)p(λ)p(π)⟩q(λ,π) + const. = ⟨ln p(X|S, λ)⟩q(λ) + ⟨ln p(S|π)⟩q(π) + const. = [ N ∑ n=1 ⟨ln p(xn|sn, λ)⟩q(λ) + ⟨ln p(sn|π)⟩q(π) ] + const. (4.47) 32
มਪ (4.47) ࣜ૯ͷୈ 1 ߲ ⟨ln p(xn|sn, λ)⟩q(λ) = K
∑ k=1 ⟨sn,k ln Poi(xn|λk)⟩qk = K ∑ k=1 sn,k(xn⟨ln λk⟩ − ⟨λk⟩) + const. (4.48) ୈ 2 ߲ ⟨ln p(sn|π)⟩q(π) = ⟨ln Cat(sn|π)⟩q(π) = K ∑ k=1 sn,k⟨ln πk⟩ (4.49) 33
มਪ ࣜ (4.47),(4.48),(4.49) ͔Β ln q(sn) = ⟨ln p(xn|sn, λ)⟩q(λ)
+ ⟨ln p(sn|π)⟩q(π) + const. = K ∑ k=1 sn,k(xn⟨ln λk⟩ − ⟨λk⟩ + ⟨ln πk⟩ + const.) ͜͜Ͱ ln Cat(s|π) = ∑ K k=1 sn,k ln πk ΑΓ q(sn) = Cat(sn|ηn ) (4.50) ͨͩ͠ ηn,k ∝ exp{xn⟨ln λk⟩ − ⟨λk⟩ + ⟨ln πk⟩} ( s.t. K ∑ k=1 ηn,k = 1 ) (4.51) λ, π ͷظܭࢉҰ୴͋ͱ·Θ͠ 34
มਪ ଓ͍ͯύϥϝʔλͷۙࣅ ln q(λ, π) = ⟨ln p(X, S, λ,
π)⟩q(S) + const. = ⟨ln p(X|S, λ)⟩q(S) + ln p(λ) + ⟨ln p(S|π)⟩q(S) + ln p(π) + const. ΑΓɼλ, π ͕ಠཱʹղ͞Ε͍ͯΔ͜ͱ͕Θ͔Δ ˠ q(λ, π) ͷΘΓʹ q(λ), q(π) ΛͦΕͧΕٻΊΕΑ͍ 35
มਪ q(sn) ͷͱ͖ͱಉ༷ʹܭࢉ͍ͯ͘͠ͱɼ݁Ռͱͯ͠ q(λk) = Gam(λk|ˆ ak,ˆ bk) (4.54) ͨͩ͠
ˆ ak = N ∑ n=1 ⟨sn,k⟩xn + a ˆ bk = N ∑ n=1 ⟨sn,k⟩ + b (4.55) ͓Αͼ q(π) = Dir(π|ˆ α) (4.56) ͨͩ͠ ˆ αk = N ∑ n=1 ⟨sn,k⟩ + αk (4.57) ͕ಘΒΕΔ 36
มਪ ࣜ (4.57) ͷظ ⟨sn,k⟩ = ⟨sn,k⟩q(S) ɼ q(sn) =
Cat(sn|ηn ) (4.50) ΑΓɼ ⟨sn,k⟩q(S) = ηn,k 37
มਪ q(λk) = Gam(λk|ˆ ak,ˆ bk), q(π) = Dir(π|ˆ α)
͕Θ͔ͬͨͷͰɼ ͋ͱ·Θ͠ʹ͍ͯͨ͠ q(sn) ͷظ ⟨λ⟩, ⟨ln λ⟩, ⟨ln π⟩ Λܭࢉ ͜͜Ͱ Eλ∼Gam(λ|a,b) [λ] = a b (2.59) Eλ∼Gam(λ|a,b) [ln λ] = ψ(a) − ln b (2.60) Eπ∼Dir(π|α) [ln πk] = ψ(αk) − ψ ( K ∑ l=1 αk ) (2.52) ψ(x) σΟΨϯϚؔ ψ(x) = d dx ln Γ(x) (A.26) 38
มਪ ࣜ (2.59), (2.60), (2.52) Λ༻͍ΔͱɼٻΊ͍ͨظ ⟨λk⟩ = ˆ ak
ˆ bk (4.60) ⟨ln λk⟩ = ψ(ˆ ak) − ln ˆ bk (4.61) ⟨πk⟩ = ψ(ˆ αk) − ψ ( K ∑ l=1 ˆ αk ) (4.62) ͱಘΒΕΔ 39
่յܕΪϒεαϯϓϦϯά ࠞ߹Ϟσϧͷ่յܕΪϒεαϯϓϦϯάͰಉ͔࣌Βύϥ ϝʔλΛपลԽআڈ p(X, S) = ∫∫ p(X, S, λ,
π)dλdπ (4.63) ͋ͱ p(S|X) ͔ΒαϯϓϦϯάͰ͖ΕΑ͍͕ʜʜ 40
่յܕΪϒεαϯϓϦϯά पลԽલޙͷάϥϑΟΧϧϞσϧ (ਤ 4.7) sn ͕΄͔ͷશͯͷ S ͷཁૉͱґଘؔ (શάϥϑ) 41
่յܕΪϒεαϯϓϦϯά p(S|X) = p(X|S)p(S) ∑ S p(X|S)p(S) ΑΓɼp(S|X) ͔ΒαϯϓϦϯά͢ΔʹɼؔͷධՁ ʹ
KN ճͷܭࢉ͕ඞཁ ˠ S ͷ֤ཁૉʹΪϒεαϯϓϦϯάΛద༻ p(sn|X, S\n ) ∝ p(xn, X\n , sn, S\n ) (4.64) = p(xn|X\n , sn, S\n )p(X\n |sn, S\n ) × p(sn|S\n )p(S\n ) (4.65) ∝ p(xn|X\n , sn, S\n )p(sn|S\n ) (4.66) 42
่յܕΪϒεαϯϓϦϯά (4.66) ࣜӈଆ p(sn|S\n ) = ∫ p(sn|π)p(π|S\n )dπ (4.70)
= Cat(sn|η\n ) (4.74) η\n,k ∝ ∑ n′̸=n sn′,k + αk (4.75) α ࣄલ p(π) = Dir(π|α) ͷύϥϝʔλ 43
่յܕΪϒεαϯϓϦϯά (4.66) ࣜࠨଆ p(xn|X\n , sn, S\n ) = ∫
p(xn|sn, λ)p(λ|X\n , S\n )dλ (4.76) ͜Ε sn,k = 1 Ͱ͚݅Δͱղੳతʹ࣮ߦͰ͖ͯ p(xn|X\n , sn,k = 1, S\n ) = NB ( xn ˆ a\n,k , 1 ˆ b\n,k + 1 ) (4.81) ˆ a\n,k = ∑ n′̸=n sn′,kxn′ + ak (4.80) ˆ b\n,k = ∑ n′̸=n sn′,k + bk (4.81) ak, bk ࣄલ p(λk) = Gam(λk|ak, bk) ͷύϥϝʔλ 44
่յܕΪϒεαϯϓϦϯά ۩ମతͳ p(sn|S\n ) ͔ΒͷαϯϓϦϯάखॱ 1. sn ͷ࣮ݱͱͯ͠ (1, 0,
. . . , 0)⊤ ͔Β (0, 0, . . . , 1)⊤ Λ༻ҙ 2. ͦΕͧΕʹରͯ͠ p(sn|S\n ) = Cat(sn|η\n ) (4.74) p(xn|X\n , sn,k = 1, S\n ) = NB ( xn ˆ a\n,k , 1 ˆ b\n,k + 1 ) (4.81) ΛධՁ 3. ͜ͷ K ݸͷΛਖ਼نԽ͢Δͱɼp(sn|X) Λࣔ͢ΧςΰϦΧ ϧ͕ಘΒΕΔ 4. ಘΒΕͨ p(sn|X) ͔ΒαϯϓϦϯά 45
؆қ࣮ݧ 1 ࣍ݩࢄඇෛσʔλͷΫϥελਪఆ݁Ռ (มਪ) 80 100 120 140 160 180
0 20 40 60 80 100 120 observation 80 100 120 140 160 180 0 20 40 60 80 100 120 estimation ͱ੨ͷ 2 Ϋϥελʹ Ϋϥελॴଐ֬Λதؒ৭Ͱදݱ 46
؆қ࣮ݧ ELBO ͷऩଋ࣌ؒ (ਤ 4.10) ॎ࣠ɿELBOɼԣ࣠ (ର)ɿܭࢉ࣌ؒ [µs] 10 5
10 4 10 3 computation time( s) 5400 5200 5000 4800 4600 4400 ELBO VI GS CGS ؆୯ͳͳͷͰ࠷ऴతͳਫ਼ʹ͕ࠩͳ͍ 47
؆қ࣮ݧ େ·͔ͳͱͯ͠ • ͍ͷมਪ • ࠷ऴతʹਫ਼͕ྑ͍ͷ่յܕ GS • ่յܕ GS
ΠςϨʔγϣϯॳظ͔Βߴਫ਼ ΦεεϝɿͱΓ͋͑ͣ GS Λࢼ͠ɼਫ਼ʹೲಘ͕͍͔ͳ͚ Εมਪɾ่յܕ GS ಋग़ͯ͠ΈΔ 48
·ͱΊ • ࣄޙͷۙࣅख๏ͱͯ͠ΪϒεαϯϓϦϯάɾϒϩοΩϯ άΪϒεαϯϓϦϯάɾ่յܕΪϒεαϯϓϦϯάɾมਪ Λհ • ϙΞιϯࠞ߹Ϟσϧʹରͯ͠ΪϒεαϯϓϦϯάɾ่յܕΪ ϒεαϯϓϦϯάɾมਪΛ۩ମతʹಋग़ • ܭࢉ͕͍࣌ؒͷมਪɼਫ਼͕ྑ͍ͷ่յܕΪϒε
αϯϓϦϯάɼಋग़ָ͕ͳͷΪϒεαϯϓϦϯά 49