Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ベイズ推論による機械学習入門 4章前半
Search
Takahiro Kawashima
October 01, 2018
Science
0
610
ベイズ推論による機械学習入門 4章前半
某所での輪読用資料
須山敦志『ベイズ推論による機械学習入門』4.1節〜4.3節
Takahiro Kawashima
October 01, 2018
Tweet
Share
More Decks by Takahiro Kawashima
See All by Takahiro Kawashima
引力・斥力を制御可能なランダム部分集合の確率分布
wasyro
0
220
集合間Bregmanダイバージェンスと置換不変NNによるその学習
wasyro
0
140
論文紹介:Precise Expressions for Random Projections
wasyro
0
430
ガウス過程入門
wasyro
0
540
論文紹介:Inter-domain Gaussian Processes
wasyro
0
170
論文紹介:Proximity Variational Inference (近接性変分推論)
wasyro
0
340
機械学習のための行列式点過程:概説
wasyro
0
1.8k
SOLVE-GP: ガウス過程の新しいスパース変分推論法
wasyro
1
1.4k
論文紹介:Stein Variational Gradient Descent
wasyro
0
1.3k
Other Decks in Science
See All in Science
CV_3_Keypoints
hachama
0
200
深層学習を用いた根菜類の個数カウントによる収量推定法の開発
kentaitakura
0
180
データから見る勝敗の法則 / The principle of victory discovered by science (open lecture in NSSU)
konakalab
1
120
地表面抽出の方法であるSMRFについて紹介
kentaitakura
1
820
局所保存性・相似変換対称性を満たす機械学習モデルによる数値流体力学
yellowshippo
1
300
データベース06: SQL (3/3) 副問い合わせ
trycycle
PRO
1
620
データベース10: 拡張実体関連モデル
trycycle
PRO
0
970
How To Buy, Verified Venmo Accounts in 2025 This year
usaallshop68
2
250
機械学習 - 授業概要
trycycle
PRO
0
230
AIに仕事を奪われる 最初の医師たちへ
ikora128
0
830
Hakonwa-Quaternion
hiranabe
1
120
安心・効率的な医療現場の実現へ ~オンプレAI & ノーコードワークフローで進める業務改革~
siyoo
0
310
Featured
See All Featured
Product Roadmaps are Hard
iamctodd
PRO
54
11k
The Straight Up "How To Draw Better" Workshop
denniskardys
236
140k
Rails Girls Zürich Keynote
gr2m
95
14k
Building Adaptive Systems
keathley
43
2.7k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
Mobile First: as difficult as doing things right
swwweet
223
9.9k
Producing Creativity
orderedlist
PRO
347
40k
Automating Front-end Workflow
addyosmani
1370
200k
Code Review Best Practice
trishagee
70
19k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.8k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Transcript
ਢࢁຊ 4 ষલ ౡوେ October 1, 2018 ిؾ௨৴େֶ 4
࣍ 1. ࠞ߹Ϟσϧͱࣄޙͷਪ 2. ֬ͷۙࣅख๏ 3. ϙΞιϯࠞ߹Ϟσϧʹ͓͚Δਪ 2
ࠞ߹Ϟσϧͱࣄޙͷਪ
ࠞ߹Ϟσϧͷಈػ ෳͷͷ͋͠ΘͤͰΑΓෳࡶͳϞσϧΛ ˠࠞ߹Ϟσϧ ୯ҰͷΨεϞσϧͰઆ໌Ͱ͖ͳͦ͞͏ 3
ࠞ߹Ϟσϧͷσʔλੜաఔ Ϋϥελ K ط ੜσʔλ X = {x1, . .
. , xN } જࡏม (one-hot) S = {s1, . . . , sN } ࠞ߹ൺ π = (π1, . . . , πK)⊤ ֤Ϋϥελύϥϝʔλ Θ = (θ1, . . . , θK)⊤ 4
ࠞ߹Ϟσϧͷσʔλੜաఔ p(X, S, Θ, π) = p(X|S, Θ)p(S|π)p(Θ)p(π) = [
N ∏ n=1 p(xn|sn, Θ)p(sn|π) ] [ K ∏ k=1 p(θk) ] p(π) (4.5) sn ʹΧςΰϦΧϧɼͦͷύϥϝʔλ π ʹσΟϦΫϨͰ ڞࣄલ p(sn|π) = Cat(sn|π) (4.2) p(π) = Dir(π|α) (4.3) 5
ࠞ߹Ϟσϧͷࣄޙ ਪఆ͍ͨ͠ະมͷಉ࣌ࣄޙ p(S, Θ, π|X) = p(X, S, Θ, π)
p(X) (4.6) ͞ΒʹΫϥελΛਪఆ͢Δʹ p(S|X) = ∫∫ p(S, Θ, π|X)dΘdπ (4.7) ͷܭࢉ͕ඞཁ 6
ࠞ߹Ϟσϧͷࣄޙ ਖ਼نԽ߲ p(X) ΛཅʹಘΔʹ p(X) = ∑ S ∫∫ p(X,
S, Θ, π)dΘdπ = ∑ S p(X, S) (4.8) Λܭࢉ ੵڞࣄલΛ͑ղੳతʹධՁͰ͖Δ͕ʜʜ S ͷͯ͢ͷΈ߹Θͤʹର͢Δ͕ඞཁ ˠ MCMCɼมਪͳͲͰࣄޙΛۙࣅ 7
֬ͷۙࣅख๏
ΪϒεαϯϓϦϯά ѻ͍ͮΒ͍֬ p(z1, z2, z3) ͷ౷ܭྔΛಘ͍ͨ ˠ MCMC(Markov chain Monte
Carlo) Ͱ p(z1, z2, z3) ͔Βαϯϓ Ϧϯά ΪϒεαϯϓϦϯά ҎԼͷ full conditional ͔Β܁Γฦ͠αϯϓϦϯάͯ͠ p(z1, z2, z3) ͔ΒͷαϯϓϦϯάܥྻΛಘΔ z(i) 1 ∼ p(z1|z(i−1) 2 , z(i−1) 3 ) z(i) 2 ∼ p(z2|z(i) 1 , z(i−1) 3 ) (4.10) z(i) 3 ∼ p(z3|z(i) 1 , z(i) 2 ) 8
ΪϒεαϯϓϦϯά 2 ࣍ݩΨεʹରͯ͠ΪϒεαϯϓϦϯά (ਤ 4.4) ੨ઢɿਅͷɼઢɿαϯϓϧू߹͔Βಘͨۙࣅ 2 1 0 1
2 3 4 z1 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 z2 p(z) q(z) 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 z1 0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50 z2 p(z) q(z) มؒͷ૬͕ؔେ͖͍ͱո͘͠ͳΓ͕ͪ 9
ൃలख๏ 1ɿϒϩοΩϯάΪϒεαϯϓϦϯά ϒϩοΩϯάΪϒεαϯϓϦϯά z2, z3 ͷಉ࣌Λ༻͍ͯΪϒεαϯϓϦϯά z(i) 1 ∼ p(z1|z(i−1)
2 , z(i−1) 3 ) z(i) 2 , z(i) 3 ∼ p(z2, z3|z(i) 1 ) (4.11) • z2 ͱ z3 ͷ૬͕ؔڧͯ͘͏·͍͖͍͘͢ • p(z2, z3|z(i)) ͔ΒαϯϓϦϯά͍͢͠ඞཁ 10
ൃలख๏ 2ɿ่յܕΪϒεαϯϓϦϯά ่յܕΪϒεαϯϓϦϯά z3 ΛपลԽআڈޙɼp(z1, z2) ͔ΒΪϒεαϯϓϦϯά p(z1, z2) =
∫ p(z1, z2, z3)dz3 (4.12) z(i) 1 ∼ p(z1|z(i−1) 2 ) z(i) 2 ∼ p(z2|z(i) 1 ) (4.13) • ߴԽ͕ݟࠐΊΔ • पล͕ղੳతʹٻ·Δඞཁ • Γͷม͕αϯϓϦϯά͍͢͠ܗࣜͰ͋Δඞཁ 11
มਪ ֬ p(z1, z2, z3) Λѻ͍͍ۙ͢ࣅ q(z1, z2, z3) Ͱදݱ
ˠ KL ڑ࠷খԽ qopt.(z1, z2, z3) = arg min q KL[q(z1, z2, z3)∥p(z1, z2, z3)] (4.14) มਪ q ͷදݱೳྗΛݶఆͯ͠ KL ڑΛ࠷খԽ 12
มਪ ฏۉۙࣅ ֤֬มʹಠཱੑΛԾఆ p(z1, z2, z3) ≈ q(z1)q(z2)q(z3) (4.15) q(z1),
q(z2), q(z3) Λ KL ڑ͕খ͘͞ͳΔΑ͏ஞ࣍తʹमਖ਼ Notation ⟨·⟩q(z1)q(z2)q(z3) = ⟨·⟩1,2,3 13
มਪ q(z2), q(z3) Λॴ༩ͱͯ͠ q(z1) Λ࠷దԽ qopt.(z1) = arg min
q(z1) KL[q(z1)q(z2)q(z3)∥p(z1, z2, z3)] (4.16) KL[q(z1)q(z2)q(z3)∥p(z1, z2, z3)] = − ⟨ ln p(z1, z2, z3) q(z1)q(z2)q(z3) ⟩ 1,2,3 (4.18) = − ⟨⟨ ln p(z1, z2, z3) q(z1)q(z2)q(z3) ⟩ 2,3 ⟩ 1 (4.19) = − ⟨ ⟨ln p(z1, z2, z3)⟩2,3 − ⟨ln q(z1)⟩2,3 − ⟨ln q(z2)⟩2,3 − ⟨ln q(z3)⟩2,3 ⟩ 1 (4.20) 14
มਪ ⟨ln q(z1)⟩2,3 = ln q(z1)ɼq(z1) ͱແؔͳ෦Λఆʹཧ = − ⟨⟨ln
p(z1, z2, z3)⟩2,3 − ln q(z1)⟩ 1 + const. (4.21) = − ⟨ln [exp(⟨ln p(z1, z2, z3)⟩2,3)] − ln q(z1)⟩ 1 + const. = − ⟨ ln exp(⟨ln p(z1, z2, z3)⟩2,3) ln q(z1) ⟩ 1 + const. (4.22) = KL[q(z1)∥exp{⟨ln p(z1, z2, z3)⟩2,3}] + const. (4.23) ࠷ऴతʹࣜ (4.23) ͷ࠷খ ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. (4.24) ͰಘΒΕΔ (q(z2), q(z3) ʹ͍ͭͯಉ༷) 15
มਪ ฏۉۙࣅʹΑΔมਪ (ΞϧΰϦζϜ 4.1) q(z2), q(z3) ΛॳظԽ for i =
1, . . . , max iter do ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. ln q(z2) = ⟨ln p(z1, z2, z3)⟩q(z1)q(z3) + const. ln q(z3) = ⟨ln p(z1, z2, z3)⟩q(z1)q(z2) + const. end for ͏ͪΐ ͬͱ͔͍͜͠ऴྃ݅Λઃఆ͍ͨ͠ ˠͨͱ͑ ELBO(evidence lower bound) ΛධՁج४ʹ 16
มਪ ELBO(A.4, p.233) มਪʮपลͷԼݶʯͷ࠷େԽख๏ͱͯ͠ଊ͑ΒΕΔ Xɿ؍ଌσʔλɼZɿະ؍ଌม Z ∼ q(Z) ΛԾఆ ln
p(X) = ln ∫ p(X, Z)dZ = ln ∫ q(Z) p(X, Z) q(Z) dZ ≥ ∫ q(Z)ln p(X, Z) q(Z) dZ (Jensen ͷෆࣜ) =: L[q(Z)] (A.39) 17
มਪ ࢀߟɿJensen ͷෆࣜ ҙͷ “্ʹ” ತͳؔ fɼҙͷ֬ີؔ p ʹؔͯ͠ f
(∫ y(x)p(x)dx ) ≥ ∫ f(y(x))p(x)dx (A.40) 18
มਪ ELBO(A.4, p.233) पลͷԼݶ L[q(Z)] Λ q(Z) ͷ ELBO ͱΑͿ
ରपลͱ ELBO ͱͷࠩ q(Z) ͱ p(Z|X) ͱͷ KL ڑʹ ͍͠ KL[q(Z)∥p(Z|X)] = ∫ q(Z)ln q(Z) p(Z|X) dZ = ∫ q(Z)ln q(Z)p(X) p(X, Z) dZ = p(X) − ∫ q(Z)ln p(X, Z) q(Z) dZ = p(X) − L[q(Z)] (A.41) 19
มਪ ELBO(A.4, p.233) KL[q(Z)∥p(Z|X)] = p(X) − L[q(Z)] (A.41) ln
p(X) σʔλͱϞσϧॴ༩ͷͱఆ ˠ q(Z) ʹؔ͢Δ KL ڑ࠷খԽͱରपลͷԼݶ L[q(Z)] ͷ ࠷େԽՁ ELBO ͷมԽ͕ఆ ϵ ΑΓখ͘͞ͳͬͨͱ͖ʹมਪΞϧΰ ϦζϜΛࢭΊΔ 20
มਪ ߏԽมਪ ਅͷΛ෦తʹۙࣅؔʹղ p(z1, z2, z3) ≈ q(z1)q(z2, z3) (4.26)
21
มਪ (؆қ࣮ݧ) 2 ࣍ݩΨεʹมਪΛద༻ (ਤ 4.5) 1.0 0.5 0.0 0.5
0.50 0.25 0.00 0.25 0.50 1 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 2 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 3 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 4 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 5 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 6 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 7 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 8 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 9 of 10 0.5 0.0 0.5 0.50 0.25 0.00 0.25 0.50 10 of 10 ੨ઢɿਅͷ ઢɿۙࣅࣄޙ 22
มਪ (؆қ࣮ݧ) 2 ࣍ݩΨεʹมਪΛద༻ (ਤ 4.5) 2 4 6 8
10 iteration 0.46 0.48 0.50 0.52 0.54 KL divergence KL ڑ୯ௐݮগ 23
มਪ (؆қ࣮ݧ) 2 ࣍ݩΨεʹมਪΛద༻ (ਤ 4.5) • ͍ • ΠςϨʔγϣϯ͝ͱʹ
KL ڑ͕୯ௐݮগ • ڧ͍૬ؔΛଊ͑ΒΕͳ͍ 24
ϙΞιϯࠞ߹Ϟσϧʹ͓͚Δਪ
ϙΞιϯࠞ߹Ϟσϧ 1 ࣍ݩࢄඇෛσʔλͷΫϥελΛਪఆ (ਤ 4.6) 80 100 120 140 160
180 0 20 40 60 80 100 120 observation 25
ϙΞιϯࠞ߹Ϟσϧ p(xn|λk) = Poi(xn|λk) (4.27) ΑΓ p(xn|sn, λ) = K
∏ k=1 Poi(xn|λk)sn,k (4.28) λk ͷڞࣄલ p(λk) = Gamma(λk|a, b) (4.29) 26
ΪϒεαϯϓϦϯά ࠞ߹ͰજࡏมͱύϥϝʔλΛ͚ͯαϯϓϧ͢ΔͱΑ͍ S ∼ p(S|X, λ, π) (4.31) λ, π
∼ p(λ, π|X, S) (4.32) ม S ͷΈʹண p(S|X, λ, π) ∝ p(X|S, λ)p(S|π) = N ∏ n=1 p(xn|sn, λ)p(sn|π) (4.33) 27
ΪϒεαϯϓϦϯά p(xn|sn, λ), p(sn|π) ΛͦΕͧΕܭࢉ͢Δͱɼ࠷ऴతʹ sn ∼ Cat(sn|ηn ) (4.37)
ͨͩ͠ ηn,k ∼ exp{xnln λk − λk + ln πk} ( s.t. K ∑ k=1 ηn,k = 1 ) (4.38) ͕ಘΒΕΔ 28
ΪϒεαϯϓϦϯά p(λ, π|X, S) ∝ p(X, S, λ, π) =
p(X|S, λ)p(S|π)p(λ)p(π) (4.39) ˠ λ ͱ π ͷࣄޙಠཱ λ ʹؔͷ͋Δͱ͜Ζʹ͚ͩ p(λ|X, S) ∝ p(X|S, λ)p(λ) 29
ΪϒεαϯϓϦϯά ۩ମతʹܭࢉ͍ͯ͘͠ͱ λk ∼ Gam(λk|ˆ ak,ˆ bk) (4.41) ͨͩ͠ ˆ
ak = N ∑ n=1 sn,kxn + a ˆ bk = N ∑ n=1 sn,k + b (4.42) ͱͳΔ 30
ΪϒεαϯϓϦϯά π ʹؔͷ͋Δͱ͜Ζʹ͚ͩ p(π|X, S) ∝ p(S|π)p(π) ࠷ऴతʹ π ∼
Dir(π|ˆ α) (4.44) ͨͩ͠ ˆ αk = N ∑ n=1 sn,k + αk (4.45) 31
มਪ જࡏมͱύϥϝʔλʹղ (มϕΠζ EM ΞϧΰϦζϜ) p(S, λ, π|X) ≈ q(S)q(λ,
π) (4.46) มਪͷެࣜ ln q(z1) = ⟨ln p(z1, z2, z3)⟩q(z2)q(z3) + const. (4.24) Λ༻͍Δͱ q(S) ʹؔͯ͠ ln q(S) = ⟨ln p(X, S, λ, π)⟩q(λ,π) + const. = ⟨ln p(X|S, λ)p(S|π)p(λ)p(π)⟩q(λ,π) + const. = ⟨ln p(X|S, λ)⟩q(λ) + ⟨ln p(S|π)⟩q(π) + const. = [ N ∑ n=1 ⟨ln p(xn|sn, λ)⟩q(λ) + ⟨ln p(sn|π)⟩q(π) ] + const. (4.47) 32
มਪ (4.47) ࣜ૯ͷୈ 1 ߲ ⟨ln p(xn|sn, λ)⟩q(λ) = K
∑ k=1 ⟨sn,k ln Poi(xn|λk)⟩qk = K ∑ k=1 sn,k(xn⟨ln λk⟩ − ⟨λk⟩) + const. (4.48) ୈ 2 ߲ ⟨ln p(sn|π)⟩q(π) = ⟨ln Cat(sn|π)⟩q(π) = K ∑ k=1 sn,k⟨ln πk⟩ (4.49) 33
มਪ ࣜ (4.47),(4.48),(4.49) ͔Β ln q(sn) = ⟨ln p(xn|sn, λ)⟩q(λ)
+ ⟨ln p(sn|π)⟩q(π) + const. = K ∑ k=1 sn,k(xn⟨ln λk⟩ − ⟨λk⟩ + ⟨ln πk⟩ + const.) ͜͜Ͱ ln Cat(s|π) = ∑ K k=1 sn,k ln πk ΑΓ q(sn) = Cat(sn|ηn ) (4.50) ͨͩ͠ ηn,k ∝ exp{xn⟨ln λk⟩ − ⟨λk⟩ + ⟨ln πk⟩} ( s.t. K ∑ k=1 ηn,k = 1 ) (4.51) λ, π ͷظܭࢉҰ୴͋ͱ·Θ͠ 34
มਪ ଓ͍ͯύϥϝʔλͷۙࣅ ln q(λ, π) = ⟨ln p(X, S, λ,
π)⟩q(S) + const. = ⟨ln p(X|S, λ)⟩q(S) + ln p(λ) + ⟨ln p(S|π)⟩q(S) + ln p(π) + const. ΑΓɼλ, π ͕ಠཱʹղ͞Ε͍ͯΔ͜ͱ͕Θ͔Δ ˠ q(λ, π) ͷΘΓʹ q(λ), q(π) ΛͦΕͧΕٻΊΕΑ͍ 35
มਪ q(sn) ͷͱ͖ͱಉ༷ʹܭࢉ͍ͯ͘͠ͱɼ݁Ռͱͯ͠ q(λk) = Gam(λk|ˆ ak,ˆ bk) (4.54) ͨͩ͠
ˆ ak = N ∑ n=1 ⟨sn,k⟩xn + a ˆ bk = N ∑ n=1 ⟨sn,k⟩ + b (4.55) ͓Αͼ q(π) = Dir(π|ˆ α) (4.56) ͨͩ͠ ˆ αk = N ∑ n=1 ⟨sn,k⟩ + αk (4.57) ͕ಘΒΕΔ 36
มਪ ࣜ (4.57) ͷظ ⟨sn,k⟩ = ⟨sn,k⟩q(S) ɼ q(sn) =
Cat(sn|ηn ) (4.50) ΑΓɼ ⟨sn,k⟩q(S) = ηn,k 37
มਪ q(λk) = Gam(λk|ˆ ak,ˆ bk), q(π) = Dir(π|ˆ α)
͕Θ͔ͬͨͷͰɼ ͋ͱ·Θ͠ʹ͍ͯͨ͠ q(sn) ͷظ ⟨λ⟩, ⟨ln λ⟩, ⟨ln π⟩ Λܭࢉ ͜͜Ͱ Eλ∼Gam(λ|a,b) [λ] = a b (2.59) Eλ∼Gam(λ|a,b) [ln λ] = ψ(a) − ln b (2.60) Eπ∼Dir(π|α) [ln πk] = ψ(αk) − ψ ( K ∑ l=1 αk ) (2.52) ψ(x) σΟΨϯϚؔ ψ(x) = d dx ln Γ(x) (A.26) 38
มਪ ࣜ (2.59), (2.60), (2.52) Λ༻͍ΔͱɼٻΊ͍ͨظ ⟨λk⟩ = ˆ ak
ˆ bk (4.60) ⟨ln λk⟩ = ψ(ˆ ak) − ln ˆ bk (4.61) ⟨πk⟩ = ψ(ˆ αk) − ψ ( K ∑ l=1 ˆ αk ) (4.62) ͱಘΒΕΔ 39
่յܕΪϒεαϯϓϦϯά ࠞ߹Ϟσϧͷ่յܕΪϒεαϯϓϦϯάͰಉ͔࣌Βύϥ ϝʔλΛपลԽআڈ p(X, S) = ∫∫ p(X, S, λ,
π)dλdπ (4.63) ͋ͱ p(S|X) ͔ΒαϯϓϦϯάͰ͖ΕΑ͍͕ʜʜ 40
่յܕΪϒεαϯϓϦϯά पลԽલޙͷάϥϑΟΧϧϞσϧ (ਤ 4.7) sn ͕΄͔ͷશͯͷ S ͷཁૉͱґଘؔ (શάϥϑ) 41
่յܕΪϒεαϯϓϦϯά p(S|X) = p(X|S)p(S) ∑ S p(X|S)p(S) ΑΓɼp(S|X) ͔ΒαϯϓϦϯά͢ΔʹɼؔͷධՁ ʹ
KN ճͷܭࢉ͕ඞཁ ˠ S ͷ֤ཁૉʹΪϒεαϯϓϦϯάΛద༻ p(sn|X, S\n ) ∝ p(xn, X\n , sn, S\n ) (4.64) = p(xn|X\n , sn, S\n )p(X\n |sn, S\n ) × p(sn|S\n )p(S\n ) (4.65) ∝ p(xn|X\n , sn, S\n )p(sn|S\n ) (4.66) 42
่յܕΪϒεαϯϓϦϯά (4.66) ࣜӈଆ p(sn|S\n ) = ∫ p(sn|π)p(π|S\n )dπ (4.70)
= Cat(sn|η\n ) (4.74) η\n,k ∝ ∑ n′̸=n sn′,k + αk (4.75) α ࣄલ p(π) = Dir(π|α) ͷύϥϝʔλ 43
่յܕΪϒεαϯϓϦϯά (4.66) ࣜࠨଆ p(xn|X\n , sn, S\n ) = ∫
p(xn|sn, λ)p(λ|X\n , S\n )dλ (4.76) ͜Ε sn,k = 1 Ͱ͚݅Δͱղੳతʹ࣮ߦͰ͖ͯ p(xn|X\n , sn,k = 1, S\n ) = NB ( xn ˆ a\n,k , 1 ˆ b\n,k + 1 ) (4.81) ˆ a\n,k = ∑ n′̸=n sn′,kxn′ + ak (4.80) ˆ b\n,k = ∑ n′̸=n sn′,k + bk (4.81) ak, bk ࣄલ p(λk) = Gam(λk|ak, bk) ͷύϥϝʔλ 44
่յܕΪϒεαϯϓϦϯά ۩ମతͳ p(sn|S\n ) ͔ΒͷαϯϓϦϯάखॱ 1. sn ͷ࣮ݱͱͯ͠ (1, 0,
. . . , 0)⊤ ͔Β (0, 0, . . . , 1)⊤ Λ༻ҙ 2. ͦΕͧΕʹରͯ͠ p(sn|S\n ) = Cat(sn|η\n ) (4.74) p(xn|X\n , sn,k = 1, S\n ) = NB ( xn ˆ a\n,k , 1 ˆ b\n,k + 1 ) (4.81) ΛධՁ 3. ͜ͷ K ݸͷΛਖ਼نԽ͢Δͱɼp(sn|X) Λࣔ͢ΧςΰϦΧ ϧ͕ಘΒΕΔ 4. ಘΒΕͨ p(sn|X) ͔ΒαϯϓϦϯά 45
؆қ࣮ݧ 1 ࣍ݩࢄඇෛσʔλͷΫϥελਪఆ݁Ռ (มਪ) 80 100 120 140 160 180
0 20 40 60 80 100 120 observation 80 100 120 140 160 180 0 20 40 60 80 100 120 estimation ͱ੨ͷ 2 Ϋϥελʹ Ϋϥελॴଐ֬Λதؒ৭Ͱදݱ 46
؆қ࣮ݧ ELBO ͷऩଋ࣌ؒ (ਤ 4.10) ॎ࣠ɿELBOɼԣ࣠ (ର)ɿܭࢉ࣌ؒ [µs] 10 5
10 4 10 3 computation time( s) 5400 5200 5000 4800 4600 4400 ELBO VI GS CGS ؆୯ͳͳͷͰ࠷ऴతͳਫ਼ʹ͕ࠩͳ͍ 47
؆қ࣮ݧ େ·͔ͳͱͯ͠ • ͍ͷมਪ • ࠷ऴతʹਫ਼͕ྑ͍ͷ่յܕ GS • ่յܕ GS
ΠςϨʔγϣϯॳظ͔Βߴਫ਼ ΦεεϝɿͱΓ͋͑ͣ GS Λࢼ͠ɼਫ਼ʹೲಘ͕͍͔ͳ͚ Εมਪɾ่յܕ GS ಋग़ͯ͠ΈΔ 48
·ͱΊ • ࣄޙͷۙࣅख๏ͱͯ͠ΪϒεαϯϓϦϯάɾϒϩοΩϯ άΪϒεαϯϓϦϯάɾ่յܕΪϒεαϯϓϦϯάɾมਪ Λհ • ϙΞιϯࠞ߹Ϟσϧʹରͯ͠ΪϒεαϯϓϦϯάɾ่յܕΪ ϒεαϯϓϦϯάɾมਪΛ۩ମతʹಋग़ • ܭࢉ͕͍࣌ؒͷมਪɼਫ਼͕ྑ͍ͷ่յܕΪϒε
αϯϓϦϯάɼಋग़ָ͕ͳͷΪϒεαϯϓϦϯά 49