Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ベイズ深層学習(5.1~5.2)
Search
catla
February 28, 2020
Science
0
220
ベイズ深層学習(5.1~5.2)
内容:ベイズニューラルネットワーク(5.1節),近似ベイズ推論の高速化(5.2節)
catla
February 28, 2020
Tweet
Share
More Decks by catla
See All by catla
ベイズ深層学習(6.3)
catla
2
220
ベイズ深層学習(6.2)
catla
3
220
[読み会資料] Federated Learning for Vision-and-Language Grounding Problems
catla
0
280
ベイズ深層学習(4.1)
catla
0
440
ベイズ深層学習(3.3~3.4)
catla
18
11k
ベイズ深層学習(2.2~2.4)
catla
6
1.3k
23回アルゴリズムコンテスト 1位解法
catla
6
670
Learning Lightweight Lane Detection CNNs by Self Attention Distillation(ICCV2019)の紹介
catla
0
560
TGS Salt Identification Challenge 12th place solution
catla
3
11k
Other Decks in Science
See All in Science
サイゼミ用因果推論
lw
1
7.2k
白金鉱業Meetup Vol.15 DMLによる条件付処置効果の推定_sotaroIZUMI_20240919
brainpadpr
2
800
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
120
深層学習を用いた根菜類の個数カウントによる収量推定法の開発
kentaitakura
0
140
マウス肝炎ウイルス感染の遺伝子発現へのテンソル分解の適用によるSARS-CoV-2感染関連重要ヒト遺伝子と有効な薬剤の同定
tagtag
0
100
創薬における機械学習技術について
kanojikajino
16
5.2k
Introd_Img_Process_2_Frequ
hachama
0
550
04_石井クンツ昌子_お茶の水女子大学理事_副学長_D_I社会実現へ向けて.pdf
sip3ristex
0
380
Iniciativas independentes de divulgação científica: o caso do Movimento #CiteMulheresNegras
taisso
0
1.5k
多次元展開法を用いた 多値バイクラスタリング モデルの提案
kosugitti
0
320
生成検索エンジン最適化に関する研究の紹介
ynakano
2
900
Explanatory material
yuki1986
0
260
Featured
See All Featured
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.3k
For a Future-Friendly Web
brad_frost
178
9.7k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.6k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
The World Runs on Bad Software
bkeepers
PRO
68
11k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.6k
Fontdeck: Realign not Redesign
paulrobertlloyd
84
5.5k
4 Signs Your Business is Dying
shpigford
183
22k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
How STYLIGHT went responsive
nonsquared
100
5.6k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Transcript
ϕΠζਂֶश d ܡɹঘً
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷ ۙࣅਪ๏
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ɹষͷۙࣅਪख๏ɼਂֶशϞσϧʹద༻Ͱ͖Δɽ ɹઢܗճؼϞσϧͱಉ༷ʹॱܕχϡʔϥϧωοτϫʔΫʢ//ʣΛϕΠζԽɽ ɹ ύϥϝʔλ ʹࣄલΛઃఆ͠ɼ֬తͳֶशͱ༧ଌΛՄೳʹ͢Δɽ ⟹ W ϕΠζਪʹ͓͚Δֶशͱ༧ଌ ύϥϝʔλͷಉ࣌ɿɹ
ͱදͤΔɽ ֶशɹɿɹ ΛධՁ͢Δɽ ༧ଌɹɿɹ ΛٻΊΔɽ p(Y, W|X) = p(W) N ∏ n=1 p(yn |w, xn ) p(W|X, Y) p(y* |x* , Y, X) n = 1,…, N xn yn W
ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ɹઃఆ ɹɹೖྗσʔλ ɼ؍ଌσʔλ ͓Αͼύϥϝʔλͷಉ࣌ ΛҎԼͷΑ͏ʹ͓͘ɽ ɹɹ؍ଌσʔλɼҎԼͷ͔ΒಘΒΕΔͱԾఆ͢Δɽ
ɹɹ χϡʔϥϧωοτͷؔ ݻఆͷϊΠζύϥϝʔλɽ ɹɹύϥϝʔλɼҎԼͷ͔ΒಘΒΕΔͱઃఆ͢Δɽ ɹ ɹ ݻఆͷϊΠζύϥϝʔλɽ ɹ ɹɹ X = {x1 , …, xN } Y = {y1 , ⋯, yn } p(Y, W|X) = p(W) N ∏ n=1 p(yn |w, xn ) p(yn |xn , W) = (yn | f(xn ; W), σ2 y I) f(xn ; W) σ2 y p(w) = (w|0,σ2 w ) where w ∈ W σ2 w
ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ɹಛ ɹɹ//ͷ͕Ͱ͋Δͱ͖ɼ ɹɹɹӅΕϢχοτ͕ଟ͍ɹ ɹؔෳࡶԽɽ ɹɹɹ ͕େ͖͍ɹ ɹมԽ͕ٸफ़ɽ ɹ ɹɹ
⟶ σw ⟶ ɹϕΠζ//ɼӅΕϢχοτΛ૿͢ͱɼࣄޙ͕ෳࡶʹͳ͍ͬͯ͘͜ͱ͕ ΒΕ͍ͯΔɽ
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ϥϓϥεۙࣅʹΑΔֶश ϥϓϥεۙࣅ p(Z|X) ≈ (Z|ZMAP , {Λ(ZMAP )} −1 )
Λ(Z) = − ∇2 Z log p(Z|X) ɹ؆୯ͷͨΊʹ//ͷग़ྗͷ࣍ݩΛͱ͢Δɽ ࣄޙͷۙࣅ ɹࣄޙͷ."1ਪఆΛٻΊΔɽ ɹɹ Ͱ࠷େΛऔΔύϥϝʔλ ΛٻΊΔɽ ɹࣄޙ࠷େԽɹʹɹରࣄޙ࠷େԽɹͳͷͰɼରࣄޙͷޯΛར༻͢Δ ͱɼҎԼͷΑ͏ͳ࠷దԽʹΑͬͯ."1ਪఆ͕ٻΊΒΕΔɽ ɹ ֶशɽ ⟹ p(W|Y, X) WMAP Wnew = Wold + α∇W log p(W|Y, X)| W=Wold α
ϥϓϥεۙࣅʹΑΔֶश ࣄޙͷۙࣅ ɹࣄޙͷޯɼҎԼͷΑ͏ʹٻΒΕΔɽɹɹɹ ɹɹɹɹɹɹɹɹɹɹ Αͬͯɼ ɹɹɹɹɹɹɹɹɹ ύϥϝʔλ Ͱภඍ͢ΔͱɼҎԼͷΑ͏ʹίετؔͷඍͱͳΔɽ
ɹɹɹɹɹɹɹɹɹ ɼͦΕͧΕ//ͷޡࠩؔͱ֤ύϥϝʔλͷࣄલʹ༝དྷ͢Δਖ਼ଇԽ ߲Ͱ͋Δɽ p(W|Y, X) = p(W)p(Y|X, W) p(X|Y) ∝ p(W)p(Y|X, W) log p(W|Y, X) = log p(Y|X, W) + log p(W) + c = N ∑ n=1 log p(yn |xn , W) + ∑ w∈W log p(w) + c w ∈ W ∂ ∂w log p(W|Y, X) = − { 1 σ2 y ∂ ∂w E(W) + 1 σ2 w ∂ ∂w ΩL2 (W) } E(W), ΩL2 (W)
ϥϓϥεۙࣅʹΑΔֶश ࣄޙͷۙࣅ ɹΑͬͯɼ."1ਪఆΛٻΊͨΒɼࣄޙΛҎԼͷΑ͏ʹۙࣅͰ͖Δɽ ɹɹɹɹɹɹɹɹɹɹ ޡࠩؔʹର͢ΔϔοηߦྻͰ͋Δɽ p(W|Y, X) ≈
q(W) = (W|WMAP , {Λ(WMAP )} −1 ) Λ(W) = − ∇2 W log p(W|Y, X) = 1 σ2 w I + 1 σ2 y H H
ϥϓϥεۙࣅʹΑΔֶश ༧ଌͷۙࣅ ɹϥϓϥεۙࣅΛ༻͍Δͱɼ༧ଌҎԼͷΑ͏ʹۙࣅͰ͖Δɽ ɹ ɹ͔͠͠ɼ ͷதʹ//ؚ͕·Ε͍ͯΔͷͰɼղੳతܭࢉ͕ෆՄೳɽ ɹ͜͜Ͱɼύϥϝʔλͷࣄޙͷີ͕."1ਪఆͷपลʹूத͓ͯ͠Γɼ͔ͭͦͷ খ͞ͳൣғʹ͓͍ͯ ͕
ͷઢܕؔͰΑۙ͘ࣅͰ͖Δͱ͍͏ԾઆΛ͓͘ɽ͜ͷ Ծઆ͔Βɼςʔϥʔల։Ͱ ͷؔ Λ ·ΘΓͰ࣍ۙࣅ͢ΔͱɼҎԼͷΑ͏ ʹͳΔɽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(y* |x* , Y, X) = p(y* |x* ) = ∫ p(y* |x* , W)p(W|X, Y)dW ≈ ∫ p(y* |x* , W)q(W)dW p(y* |x* , W) f(x* |W) W W f(x* |W) WMAP f(x* ; W) ≈ f(x* ; WMAP ) + gT(W − WMAP ) g = ∇W f(x* ; W)| W=WMAP
ϥϓϥεۙࣅʹΑΔֶश ༧ଌͷۙࣅ ɹΑͬͯɼ·ͱΊΔͱҎԼͷۙࣅ͕ࣜಘΒΕΔɽ ɹ ɹ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(y* |x* ,
Y, X) = p(y* |x* ) = ∫ p(y* |x* , W)p(W|X, Y)dW ≈ ∫ p(y* |x* , W)q(W)dW = ∫ (yn | f(xn ; W), σ2 y )(W|WMAP , {Λ(WMAP )}−1)dW = ∫ (yn | f(x* ; WMAP ) + gT(W − WMAP ), σ2 y ) (W|WMAP , {Λ(WMAP )}−1)dW = (y* | f(x* ; WMAP ), σ2(x* )) σ2(x* ) = σ2 y + gT{Λ(WMAP )}−1g
ϥϓϥεۙࣅʹΑΔֶश ༧ଌͷۙࣅ ɹΑͬͯɼ·ͱΊΔͱҎԼͷۙࣅ͕ࣜಘΒΕΔɽ ɹ ɹ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(y* |x* ,
Y, X) = p(y* |x* ) = ∫ p(y* |x* , W)p(W|X, Y)dW ≈ ∫ p(y* |x* , W)q(W)dW = ∫ (yn | f(xn ; W), σ2 y )(W|WMAP , {Λ(WMAP )}−1)dW = ∫ (yn | f(x* ; WMAP ) + gT(W − WMAP ), σ2 y ) (W|WMAP , {Λ(WMAP )}−1)dW = (y* | f(x* ; WMAP ), σ2(x* )) σ2(x* ) = σ2 y + gT{Λ(WMAP )}−1g ϥϓϥεۙࣅ ςʔϥʔల։ͷҰ࣍ۙࣅ
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ɹରࣄޙʢϋϛϧτχΞϯʹ͓͚ΔϙςϯγϟϧΤωϧΪʔʣ͕αϯϓϦϯά͠ ͍ͨมʹରͯ͠ඍՄೳͳΒ).$๏͕ద༻Ͱ͖Δɽܭࢉ࣌ؒ͑͞ेʹ֬อ͍ͯ͠Ε ɼཧతʹਅͷࣄޙ͔Βͷαϯϓϧ͕ಘΒΕΔʢ.$.$ͷಛʣɽ݁Ռతʹɼෳ ͷαϯϓϧ͔Βෆ࣮֬ੑΛදݱͰ͖Δɽ
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ॏΈύϥϝʔλͷਪ ɹਖ਼نԽ͞Ε͍ͯͳ͍ࣄޙΛར༻͢ΕɼରԠ͢ΔϙςϯγϟϧΤωϧΪʔҎԼ ͷΑ͏ʹͳΔɽ ͜ΕΛඍ͢Δͱɼઌ΄Ͳొͨ͠ίετؔͷඍͱՁͰ͋Δ͜ͱ͕Θ͔Δɽ ɹ ޡࠩٯ๏ʹΑΔޯܭࢉ͕ར༻Ͱ͖Δɽ ʲ.$.$ʹجͮ͘ͷۙࣅਪͷʳ
w αϯϓϧ͕ेͰ͋Δ͔ΛΔखஈ͕ͳ͍ɽ w .$.$ͷύϥϝʔλௐ͕͍͠ɽʢFH).$๏ʹ͓͚ΔεςοϓαΠζεςοϓͳͲ w ֶश͕ɽɹ (W) = − {log p(Y|X, W) + log p(W)} ⟹
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ ɹϋΠύʔύϥϝʔλͰ͋Δ ʹͦΕͧΕࣄલΛ༩͑Δ͜ͱͰ ͱಉ࣌ʹ ਪՄೳͰ͋Δɽ ɹ ɹਫ਼ύϥϝʔλ Λಋೖ͠ɼҎԼͷΑ͏ʹࣄલΛΨϯϚͰఆٛ͢Δɽ
ɹಉ༷ʹ ʹରͯ͠ɼҎԼͷΑ͏ʹఆٛ͢Δɽ σw σy W γw = σ−2 w p(γw ) = Gam(γw |aw , bw ) (aw , bw ਖ਼ͷݻఆ) γy = σ−2 y p(γy ) = Gam(γy |ay , by ) (ay , by ਖ਼ͷݻఆ)
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ ɹϞσϧʢύϥϝʔλͷಉ࣌ʣΛվΊͯॻ͘ͱɼҎԼͷΑ͏ʹͳΔɽ ɹ p(Y, W, γw , γy
|X) = p(γw )p(γy )p(W|γw ) N ∏ n=1 p(yn |xn , W, γy ) n = 1,…, N xn yn W γy γw ɹࣄޙɼҎԼͷΑ͏ʹͳΔɽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(W, γw , γy |X, Y) αy βw βy αw
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ ɹΪϒεαϯϓϦϯάΛ༻͍ͯɼ ΛαϯϓϦϯά͢Δɽ w ͷαϯϓϦϯά ɹɹɹઌ΄Ͳͱಉ༷ʹɼ).$๏Ͱαϯϓϧ͢Δɽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ
w ͷαϯϓϦϯά ɹɹɹ ɹɹɹ Ψεɼ ΨϯϚʢΨεͷڞࣄલʣͳͷͰɼ ɹɹɹ ΨϯϚͰ͋ΔɽΑͬͯɼ ͨͩ͠ɼ ॏΈύϥϝʔλͷ૯ɽ W, γw , γy W W ∼ p(W|Y, X, γw , γy ) γw p(γw |Y, X, W, γy ) ∝ p(W|γw )p(γw ) p(W|γw ) p(γw ) p(γw |Y, X, W, γy ) γw ∼ Gam( ̂ aw , ̂ bw ) ̂ aw = aw + Kw 2 ̂ bw = bw + 1 2 ∑ w∈W w2 Kw
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ w ͷαϯϓϦϯά ɹɹɹ ɹɹɹ Ψεͷ૯ͳͷͰΨεɼ ΨϯϚΑΓɼ
ɹɹɹ ΨϯϚͰ͋ΔɽΑͬͯɼ γy p(γy |Y, X, W, γw ) ∝ p(γw ) N ∏ n=1 p(yn |xn , W, γr ) N ∏ n=1 p(yn |xn , W, γr ) p(γy ) p(γy |Y, X, W, γw ) γy ∼ Gam( ̂ ay , ̂ by ) ̂ ay = ay + N 2 ̂ by = by + 1 2 N ∑ n=1 {yn − f(xn ; W)}2
ϋϛϧτχΞϯϞϯςΧϧϩ๏ʢ).$๏ʣʹΑΔֶश ϋΠύʔύϥϝʔλͷਪ ɹΨϯϚ ͷฏۉ ɼࢄ ͳͷͰɼ ͕େ͖͍΄Ͳ ʹΑΔ ͷਪఆਫ਼͕ѱ͘ɼ؍ଌʹର͢Δࢄ͕େ͖͘ͳΔΑ͏ʹֶश͞ΕΔɽ
ɹ ɹࠓճɼॏΈύϥϝʔλͷਫ਼ύϥϝʔλɼશମʹͬͯڞ௨ͷ Ͱ͓͍͍͕ͯͨɼ //ͷ֤͝ͱʹਫ਼ύϥϝʔλ ͱ͓͘͜ͱՄೳͰ͋Δɽ Gam(a, b) a/b a/b2 ̂ by f(xn |W) yn γw (γ(1) w , …, γ(L) w )
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ۙࣅϕΠζਪͷߴԽ
ۙࣅϕΠζਪͷߴԽ ʲϕΠζχϡʔϥϧωοτϫʔΫͷܽʳ ɹύϥϝʔλͷपลԽʹ͏ܭࢉྔ͕େ ɹɹ ༧ଌπʔϧͱͯ͋͠·ΓΘΕͳ͔ͬͨɽ ɹ·ͨɼਂֶशඞཁͳֶशσʔλ͕େ ɹɹ όονֶशΛલఏͱͨ͠ख๏Ͱܭࢉޮ͕ѱ͍ɽ ʲͲͷΑ͏ʹܽΛิ͏ʁʳ w
ੵআڈΛۙࣅਪ͢Δ͜ͱͰɼܭࢉͷޮΛ্͛Δɽ w ϛχόονֶशΛಋೖ͢Δɽ ⟹ ⟹
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ʲʳ ɹ.$.$Λར༻ֶͨ͠शେنͳσʔλʹରͯ͠ɼܭࢉޮ͕ѱ͍ɽ ʲղܾࡦʳ ɹܭࢉޮͷߴ͍ϛχόονʹجֶͮ͘शख๏ʢFH֬తޯ߱Լ๏ʣͱෆ࣮֬ੑͷ ਪఆ͕Մೳͳ.$.$ʢFH.)๏ɼ).$๏ʣΛΈ߹ΘͤΔɽ ɹ ֬తϚϧίϑ࿈ϞϯςΧϧϩ๏ ⟹
֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ʲֶशʳ ɹ֬తޯ߱Լ๏ͱϥϯδϡόϯಈྗֶ๏ΛΈ߹Θͤͨɹ֬తޯϥάδϡόϯ ಈྗֶ๏ɹΛར༻ֶͨ͠शΛߟ͑Δɽ ɹύϥϝʔλͷߋ৽Λɹ ͱද͢ɽ ɹ֬తޯ߱Լ๏Ͱɼύϥϝʔλͷߋ৽෯ΛҎԼͷΑ͏ʹॻ͚Δɽ ͨͩ͠ɼ
αϒαϯϓϧͷେ͖͞Ͱ͋ΓɼՃ͑ͯɼϩϏϯεɾϞϯϩʔΞϧΰϦζϜͷ Έʹ͢ΔͨΊʹɼεςοϓʹ͓͚Δֶश ҎԼͷ݅Λຬͨ͢Α͏ʹઃఆ͢ Δɽ Wnew = Wold + ΔW ΔW = αt 2 ∇W log p(W|Xs , Ys ) = αt 2 { N M ∑ n∈S ∇W log p(yn |xn , W) + ∇W log p(W) } M t αt ∞ ∑ i=1 αt = ∞, ∞ ∑ i=1 α2 t < ∞
֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ʲֶशʳ ɹҰํͰɼόονֶशΞϧΰϦζϜͷϥϯδϡόϯಈྗֶ๏ͷαϯϓϧΛಘΔͨΊʹඞ ཁͳεςοϓɼϙςϯγϟϧΤωϧΪʔΛ ɼεςοϓαΠζΛ ΛӡಈྔϕΫτϧͱ͢Δͱɼύϥϝʔλͷߋ৽෯ҎԼͷΑ͏ʹͳΔɽ
ɹ Λখ͘͢͞Εɼ.)๏ʹ͓͚Δड༰ΛݶΓͳ͘·Ͱ͚ۙͮΒΕΔɽ = − log p(W|X, Y) ϵ = αt p ΔW = − ϵ2 2 ∇W + ϵp = αt 2 ∇W log p(W|X, Y) + αt p = αt 2 { N ∑ n=1 ∇W log p(yn |xn , W) + ∇W log p(W) } + αt p, p ∼ (0, I) . αt
֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ʲֶशʳ ɹઌͷͭʢ֬తޯ߱Լ๏ͱϥϯδϡόϯಈྗֶ๏ʣΛΈ߹ΘͤΔͱɼߋ৽෯͕Ҏ ԼͷΑ͏ʹͳΔɽ ɹɹɹɹɹɹɹ ֶशɼઌ΄Ͳͷ݅ͱಉ༷ɽ ɹ ɹʬ͕খ͖͞ͱ͖ʢֶशॳظஈ֊ʣ㲊 ɹɹ4(%ͷརΛੜ͔ͯ͠ࣄޙͷۭؒΛޮతʹ୳ࡧɽ
ɹʬ͕େ͖͘ͳΔʹͭΕͯ㲊 ϥϯδϡόϯಈྗֶ๏ʹΑΔਅͷࣄޙ͔ΒۙࣅతͳαϯϓϧΛಘΒΕΔɽ ΔW = αt 2 { N M ∑ n∈S ∇W log p(yn |xn , W) + ∇W log p(W) } + αt p, p ∼ (0, I) . t t
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
֬తมਪ๏ ɹઌ΄Ͳɼ֬తޯ๏ͱ.$.$ͷΈ߹ΘͤΛհͨ͠ɽ ɹ࣍ɼมਪ๏ͱ֬తޯ߱Լ๏ΛΈ߹ΘͤΔɽ ɹɹ ֬తมਪ๏ ɹ ɹΛมύϥϝʔλͷू߹ͱͨ͠ͱ͖ɼ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ͱͳΔΑ͏ͳۙࣅ
ΛٻΊΔ͜ͱ͕ඪɽ ⟹ ξ q(W; ξ) ≈ p(W|X, Y) q(W; ξ)
֬తมਪ๏ ɹޮԽͷͨΊʹϛχόονΛಋೖ͢Δɼ ɹ ɹϛχόονͰܭࢉ͞Εͨ ʹର͢ΔෆภਪఆྔͱͳΔɽ
ɹ͕ͨͬͯ͠ɼ Λ࠷େԽ͢ΔΘΓʹɼ Λ࠷େԽ͢Δ͜ͱʹΑͬͯɼޮ Α͘ύϥϝʔλͷࣄޙΛۙࣅͰ͖Δɽ ℒ(ξ) = N ∑ n=1 ∫ q(W; ξ)log p(yn | f(xn ; W))dW − DKL [q(W; ξ)||p(W)] ℒS (ξ) = N M ∑ n∈S ∫ q(W; ξ)log p(yn | f(xn ; W))dW − DKL [q(W; ξ)||p(W)] ℒs ℒ S [ℒs (ξ)] = ℒ(ξ) ℒ(ξ) ℒs (ξ) ϛχόονԽ
֬తมਪ๏ ɹ͜ͷޙͷεϥΠυͰɼۙࣅΛ࣍ͷΑ͏ͳಠཱͳΨεͱԾఆ͠ɼ&-#0Λ ޯ߱Լ๏Λར༻ͯ͠࠷େԽ͢Δ͜ͱΛߟ͑Δɽ q(W; ξ) = ∏ i,j,l (w(l)
i,j |μ(l) i,j , σ(l) i,j 2 )
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ޯͷϞϯςΧϧϩۙࣅ ɹχϡʔϥϧωοτϫʔΫͷ&-#0࠷େԽͰɼ&-#0ʹ͓͚Δύϥϝʔλ ղੳతʹ ੵআڈͰ͖ͳ͍ɽ ɹ ޯ߱Լ๏ʹΑͬͯ Λ࠷େԽɽ ɹޯ߱Լ๏Λ͏ͨΊʹ ΛมύϥϝʔλʹΑΔޯܭࢉΛ͢Δඞཁ͕͋Δɽ
ɼͲͪΒΨεͳͷͰղੳతʹޯܭࢉͰ͖ΔɽҰํͰɼର ղੳతʹੵͰ͖ͳ͍ɽ W ⟹ ℒS (ξ) ℒS (ξ) ξ DKL [q(W; ξ)||p(W)] ∫ q(W; ξ)log p(yn | f(xn ; W))dW
ޯͷϞϯςΧϧϩۙࣅ ɹχϡʔϥϧωοτϫʔΫͷ&-#0࠷େԽͰɼ&-#0ʹ͓͚Δύϥϝʔλ ղੳతʹ ੵআڈͰ͖ͳ͍ɽ ɹ ޯ߱Լ๏ʹΑͬͯ Λ࠷େԽɽ ɹޯ߱Լ๏Λ͏ͨΊʹ ΛมύϥϝʔλʹΑΔޯܭࢉΛ͢Δඞཁ͕͋Δɽ
ɼͲͪΒΨεͳͷͰղੳతʹޯܭࢉͰ͖ΔɽҰํͰɼର ղੳతʹੵͰ͖ͳ͍ɽ W ⟹ ℒS (ξ) ℒS (ξ) ξ DKL [q(W; ξ)||p(W)] ∫ q(W; ξ)log p(yn | f(xn ; W))dW ɹϞϯςΧϧϩ๏ͰੵʢରʣΛۙࣅͯ͠ɼޯͷਪఆΛಘΑ͏ʂ
ޯͷϞϯςΧϧϩۙࣅ ʲඪʳ ɹύϥϝʔλ ʹରͯ͠ɼ͋Δ ͱ Λߟ͑ɼ࣍ͷޯΛਪ͢ Δ͜ͱɽ ʲܭࢉํ๏ʳ
ɹείΞؔਪఆɼ࠶ύϥϝʔλԽޯɼҰൠԽ࠶ύϥϝʔλԽޯɼӄؔඍͳͲ w ∈ ℝ f(w) q(w; ξ) I(ξ) = ∇ξ ∫ f(w)q(w; ξ)dw
ޯͷϞϯςΧϧϩۙࣅ είΞؔਪఆ ɹҎԼͷΑ͏ʹ Λมܗ͢Δɽ ɹ͕ͨͬͯ͠ɼ ͔Β ΛෳαϯϓϦϯά͔ͯ͠ΒඍΛධՁ͢Δ͜ͱͰ ͷෆ
ภਪఆྔ͕ಘΒΕΔɽ ʲద༻Ͱ͖Δ݅ʳɹ ͷඍ͕ܭࢉՄೳɽ ʲʳɹ࣮༻্ඇৗʹߴ͍ࢄ͕ൃੜͯ͠͠·͏ɽ ʲղܾࡦʳɹ੍ޚมྔ๏ͳͲͷࢄݮগख๏ͱΈ߹ΘͤΔɽ I(ξ) I(ξ) = ∇ξ ∫ f(w)q(w; ξ)dw = ∫ f(w)∇ξ q(w; ξ)dw = ∫ f(w)q(w; ξ)∇ξ log q(w; ξ)dw = q(w;ξ) [ f(w)∇ξ log q(w; ξ)] q(w; ξ) w I(ξ) log q(w; ξ)
ޯͷϞϯςΧϧϩۙࣅ ࠶ύϥϝʔλԽޯ ɹ Λ ͔ΒαϯϓϦϯά͢ΔΘΓʹɼʹґଘ͠ͳ͍ ͔ΒΛαϯϓϦϯ ά͠ɼม Λద༻͢Δ͜ͱͰؒతʹ ͷαϯϓϦϯάΛ͢Δ͜ͱΛߟ͑Δɽ ɹ͕ͨͬͯ͠ɼҎԼͷΑ͏ʹޯͷෆภਪఆྔ͕ಘΒΕΔɽ
ʲ۩ମྫʳɹ ɼ ͷ߹ ɹ ɼ ͱ͢Δ͜ͱͰɼ ͔ΒαϯϓϦϯ άͰ͖Δɽมύϥϝʔλʹؔ͢Δޯͷඍɼ࣍ͷΑ͏ʹͳΓɼ֤มύϥϝʔλ ͷޯͷෆภਪఆྔ͕ಘΒΕΔɽ ɹɹɹɹ ɹɹɹɹ w q(w; ξ) ξ q(ϵ) ϵ w = g(ξ, ϵ) w q(ϵ) [ f′(g(ξ; ϵ))∇ξ g(ξ; ϵ)] = I(ξ) ξ = { ̂ μ, ̂ σ2} q(w; ξ) = (w| ̂ μ, ̂ σ2) ˜ ϵ ∼ (0,1) = q(ϵ) ˜ w = g(ξ; ϵ) = ̂ μ + ̂ σϵ ˜ w ( ̂ μ, ̂ σ2) ∂ ∂ ̂ μ ∫ f(w)q(w; ξ)dw = ∫ f′(w)q(w; ξ)dw ∴ I( ̂ μ) = q(w;ξ) [ f′(w)] ∂ ∂ ̂ σ ∫ f(w)q(w; ξ)dw = ∫ f′(w) (w − ̂ μ) ̂ σ q(w; ξ)dw ∴ I( ̂ μ) = q(w;ξ) [f′(w) (w − ̂ μ) ̂ σ ]
ޯͷϞϯςΧϧϩۙࣅ ࠶ύϥϝʔλԽޯͷҰൠԽ ʲ࠶ύϥϝʔλԽޯͷརʳ ɹɹείΞؔਪఆͱൺͯޯͷࢄΛখ͑͘͞ΒΕΔɽ ʲ࠶ύϥϝʔλԽޯͷʳ ɹɹมม ͕ඞཁɽʢશͯͷͰద༻Ͱ͖ΔΘ͚Ͱͳ͍ɽʣ ʲղܾࡦɹྫɿʳɹҰൠԽ࠶ύϥϝʔλԽޯ ɹɹ ʹؔ͢Δ੍Λ؇Ίɼଟ͘ͷछྨͷʹରͯ͠ద༻Մೳͱͨ͠ͷɽ
ɹɹ ͷΑ͏ʹมύϥϝʔλͷґଘੑΛ͢͜ͱΛڐ͢ɽ ʲղܾࡦɹྫɿʳɹӄؔඍ ɹʲ͑Δ݅ʳ w ΛٻΊΔ͜ͱࠔ͕ͩɼٯม ༰қʹಘΒΕΔɽ w ࿈ଓͷ ɹɹ ΛͰඍ͢Δ͜ͱͰظͷޯΛಘΔɽ g g q(ϵ; ξ) g g−1 ϵ = g−1(ϵ; ξ) ξ
ޯͷϞϯςΧϧϩۙࣅ ࠶ύϥϝʔλԽޯͷҰൠԽ ʲղܾࡦɹྫɿʳɹ࿈ଓ؇ ɹɹࢄͷ֬ʹରͯ͠࠶ύϥϝʔλԽޯΛద༻͢Δํ๏ɽ ɹʲ۩ମྫʳ ΧςΰϦʢࢄʣɼΨϯϕϧιϑτϚοΫεʢ࿈ଓʣͷԹύ ϥϝʔλΛʹઃఆͨ͠ͷͱҰக͢Δɽ ɹɹ
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ޯۙࣅʹΑΔมਪ๏ ɹ࣮ࡍʹ࠶ύϥϝʔλԽޯΛར༻ͯ͠ϕΠζχϡʔϥϧωοτͷ&-#0Λ࠷େԽ͢Δɽ ᶃ ϛχόον Λσʔληοτ ͔ΒϥϯμϜʹநग़͢Δɽ ᶄ .ݸʢϛχόονͷαϯϓϧʣͷϊΠζΛऔಘ͢Δɽ ɹ
ᶅ มύϥϝʔλʹؔ͢ΔޯΛܭࢉ͢Δɽ ᶆ &-#0ͷ૿ՃํʹมύϥϝʔλΛߋ৽͢Δɽ s ˜ ϵi ∼ (0, I) ℒs (ξ) = N M ∑ n∈S ∫ q(W; ξ)log p(yn | f(xn ; W))dW − DKL [q(W; ξ)||p(W)] = N M ∑ n∈S ∫ p(ϵ)log p(yn | f(xn ; g(ξ; ϵ)))dϵ − DKL [q(W; ξ)||p(W)] ≈ ℒS,ϵ (ξ) ( ∵ ,ϵ [ℒS,ϵ (ξ)] = ℒ(ξ)) = N M ∑ n∈S log p(yn | f(xn ; g(ξ; ˜ ϵn ))) − DKL [q(W; ξ)||p(W)], ∇ξ ℒs (ξ) ≈ ∇ξ ℒS,ϵ (ξ) = N M ∑ n∈S ∇ξ log p(yn | f(xn ; g(ξ; ˜ ϵn ))) − ∇ξ DKL [q(W; ξ)||p(W)] . ξ ← ξ + α∇ξ ℒS,ϵ (ξ)
ຊͷ༰ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧͷۙࣅਪ๏ ‣ϕΠζχϡʔϥϧωοτϫʔΫϞσϧ ‣ϥϓϥεۙࣅʹΑΔֶश ‣ϋϛϧτχΞϯϞϯςΧϧϩ๏ ‣ۙࣅϕΠζਪͷޮԽ ‣֬తޯϥϯδϡόϯಈྗֶ๏ʹΑΔֶश ‣֬తมਪ๏ʹΑΔֶश ‣ޯͷϞϯςΧϧϩۙࣅ ‣ޯۙࣅʹΑΔมਪ๏
‣ظ๏ʹΑΔֶश
ظ๏ʹΑΔֶश ɹॱܭࢉͰχϡʔϥϧωοτϫʔΫΛ௨ͨ֬͠ͷʹΑΓपลͷධՁΛ ߦ͍ɼٯͰύϥϝʔλΛֶश͢ΔͨΊʹظ๏Λ༻͍ͯपลͷޯΛ ܭࢉ͢Δɽ ֬తٯ๏ ɹ֬తٯ๏σʔλΛஞ࣍తʹॲཧͰ͖ΔͷͰɼେྔσʔλΛ༻ֶ͍ͨशͰε έʔϧՄೳɽ؍ଌσʔλͷਫ਼ύϥϝʔλॏΈͷࣄલΛࢧ͢Δਫ਼ύϥϝʔλ ۙࣅਪՄೳɽ ⟹
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश Ϟσϧ ʲઃఆʳ ɹɹ ͱ͠ɼपลΛҎԼͷΑ͏ʹఆٛ͢Δɽ ɹ
ͷ׆ੑԽؔʹਖ਼نԽઢܗؔʢ3F-6ʣΛ༻͍Δɽ ɹɹύϥϝʔλ ɼಠཱͳΨεʹै͏ͱ͢Δɽ ʲඪʳ ɹɹҎԼͷࣄޙΛۙࣅਪ͢Δ͜ͱɽ yn ∈ ℝ p(Y|X, W, γr ) = N ∏ n=1 (yn | f(xn ; W), γ−1 y ) p(γy ) = Gam(γr |αγy 0 , βγy 0 ) f(xn ; W) W p(W|γw ) = L ∏ l=1 Hl ∏ i=1 Hl−1 ∏ j=1 (w(l) i,j |0,γ−1 w ) p(γw ) = Gam(γw |αγw 0 , βγw 0 ) p(W, γy , γw |) ∝ p(Y|X, W, γr )p(W|γw )p(γy )p(γw )
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ۙࣅ ɹ֬తٯ๏ɼԾఆີϑΟϧλϦϯάʹج͍͍ͮͯΔɽ ɹύϥϝʔλͷۙࣅΛ࣍ͷΑ͏ʹ͓͘ɽ ɹ ɹ্ͷࣜΛԾఆີϑΟϧλϦϯάʹ͓͚ΔϞʔϝϯτϚονϯάͰஞ࣍తʹߋ৽ͯ͠ ͍͘ɽ q(W,
γy , γw ) = Gam(γy |αγy , βγy )Gam(γw |αγw , βγw ) L ∏ l=1 Hl ∏ i=1 Hl−1 ∏ j=1 (w(l) i,j |m(l) i,j , v(l) i,j ) = q(γy )q(γw )q(W) ԾఆີϑΟϧλϦϯά qi+1 (θ) ≈ ri+1 = 1 Zi+1 fi+1 (θ)qi (θ) ɿҼࢠ fi (θ)
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲॳظԽʳ ɹɹۙࣅ͕ແใʹͳΔΑ͏ʹɼ ɼ ɼ ɼ ɼ ɼ
ͰॳظԽ͢Δɽ ʲࣄલҼࢠͷಋೖʳ ɹඪͷࣄޙͷҼࢠΛͭͭՃ͢Δ͜ͱͰۙࣅΛߋ৽͢Δɽ ɹࠓճͷϞσϧʹ͓͚ΔࣄલҼࢠҎԼͷΑ͏ʹͳΔɽ ɹ m(l) i,j = 0 v(l) i,j = ∞ αγy = 1 βγy = 0 αγw = 1 βγw = 0 p(γr ), p(γw ), {p(w(l) i,j |γw )}i,j,l ࣄޙɿɹ ۙࣅɿɹ p(W, γy , γw |) ∝ p(Y|X, W, γr )p(W|γy )p(γw )p(γw ) q(W, γy , γw ) = q(γy )q(γw )q(W)
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ wҼࢠ ͓Αͼ ͷՃɽ ɹۙࣅ Λࣄલ ͱಉ͡ͷʹ͍ͯ͠ΔͷͰɼҼࢠͷߋ৽ ҎԼͷΑ͏ʹͳΔɽ
ɹɹɹɹɹɹɹɹ ɼ ɼ ɼ ͭ·Γɼ ɼ p(γw ) p(γy ) q(γy ), q(γw ) p(γy ), p(γw ) qnew(γy )qnew(γw )qnew(W) ≈ p(γy )p(γw )q(W) αnew γy = αγy 0 βnew γy = βγy 0 αnew γw = αγw 0 βnew γw = βγw 0 q(γr ) ← p(γr ) q(γw ) ← p(γw ) ԾఆີϑΟϧλϦϯά qnew(γy )qnew(γw )qnew(W) ≈ r = 1 Z f new(γy , γw , W)q(γy )q(γw )q(W)
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ wҼࢠ ͷՃ ɹҎ߱ͰɼΠϯσοΫε Λলུ͢Δɽ ɹߋ৽͞ΕΔͷɼ
͓Αͼ Ͱ͋ΔɽΑͬͯɼͦΕͧΕΛҎԼͷΑ͏ʹߋ৽ ͢Δɽ ɹԼઢ෦ΛҼࢠͱΈͳ͢ɽҙ͖͢ɼͭͷͷߋ৽ʹͭͷ৽ͨʹߋ৽͞ Εͨ༻͍ͯ͠ͳ͍ͳͷͰɼߋ৽ॱʹؔͳ͍͜ͱɽ p(w(l) i,j |γw ) qnew(γy )qnew(γw )qnew(W) ≈ 1 Z p(w(l) i,j |γw )q(γy )q(γw )q(W) ⇔ qnew(γw )qnew(W) ≈ 1 Z p(w(l) i,j |γw )q(γw )q(W) i, j, l q(W) q(γw ) qnew(W) ≈ 1 Z0 p(w|γw )q(γw )q(W) qnew(γw ) ≈ 1 Z0 p(w|γw )q(W)q(γw )
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ wҼࢠ ͷՃɿ ͷߋ৽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(w(l) i,j |γw
) q(W) qnew(W) ≈ 1 Z0 p(w|γw )q(γw )q(W) ɹ ΨεͰ͋Δ͜ͱ͔ΒɼͷΨεͷྫʢQʣͱಉ༷ʹ ϞʔϝϯτϚονϯάʹΑͬͯɼҎԼͷΑ͏ʹۙࣅ͕ߋ৽͞ΕΔɽ q(W) mnew = m + v ∂ ∂m log Z0 vnew = v − v2 {( ∂ ∂m log Z0) 2 − 2 ∂ ∂v log Z0} Z0 = Z(αγw , βγw ) = ∫ p(w|γw )q(W)q(γw )dwdγw = ∫ (w|0,γ−1 w )(w|m, v)Gam(γw |αγw , βγw )dwdγw
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ wҼࢠ ͷՃɿ ͷߋ৽ ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ p(w(l) i,j |γw
) q(γw ) qnew(γw ) ≈ 1 Z0 p(w|γw )q(W)q(γw ) ɹ ΨϯϚͰ͋Δ͜ͱ͔ΒɼͷΨϯϚͷྫʢQʣͱಉ༷ʹ ϞʔϝϯτϚονϯάʹΑͬͯɼҎԼͷΑ͏ʹۙࣅ͕ߋ৽͞ΕΔɽ ɹɹɹɹɹɹɹɹ ͨͩ͠ɼ ɼ q(γw ) αnew γw = { Z0 Z2 Z−2 1 αγw + 1 αγw − 1 } −1 βnew γw = { Z2 Z−1 1 αγw + 1 βγw − Z1 Z−1 0 αγw βγw } −1 Z1 = Z(αγw + 1,βγw ) Z2 = Z(αγw + 2,βγw )
ظ๏ʹΑΔֶश ॳظԽͱࣄલҼࢠͷಋೖ ʲࣄલҼࢠͷಋೖʳ ɹਖ਼نԽఆ ݫີʹٻΊΒΕͳ͍ͷͰɼܭࢉ్தͰݱΕΔενϡʔσϯτ ͷUΛɼฏۉͱࢄͷ͍͠ΨεͰۙࣅ͢Δɽ Z(αγw , βγw
) Z(αγw , βγw ) = ∫ (w|0,γ−1 w )q(W, γy , γw )dWdγy dγw = ∫ (w|0,γ−1 w )(w|m, v)Gam(γw |αγw , βγw )dwdγw = ∫ St(w|0,αγw /βγw ,2αγw )(w|m, v)dw ≈ ∫ (w|0,(αγw − 1)/βγw )(w|m, v)dw = (w|0,(αγw − 1)/βγw + v) UΛฏۉͱࢄ͕ ͍͠Ψεʹ ۙࣅɽ
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹࣄલͷ֤Ҽࢠ͕Ճ͞Εͨޙɼ ͷҼࢠΛͭͣͭՃ͢Δɽ ɹ Ψεɼ ΨϯϚͳͷͰɼઌ΄Ͳͷߋ৽ͱಉ༷ʹߦ͏ɽ
৽͘͠ೖ͖ͬͯͨͷҼࢠ ʹର͢Δਖ਼نԽఆʢ ͷ Ճ࣌ͱҟͳΔߋ৽෦ʣΛܭࢉ͢Δ͜ͱ͕ඪɽ ɹ p(Y|X, W, γy ) qnew(γy )qnew(γw )qnew(W) ≈ 1 Z p(yi |xi , W, γy )q(γy )q(γw )q(W) ⇔ qnew(γr )qnew(W) ≈ 1 Z p(yi |xi , W, γy )q(γr )q(W) q(W) q(γy ) qnew(W) ≈ 1 Z0 p(yi |xi , W, γy )q(γw )q(W) qnew(γw ) ≈ 1 Z0 p(yi |xi , W, γy )q(W)q(γw ) ⟹ p(yi |xi , W, γy ) p(w(l) i,j |γw )
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹ൪ͷΛՃͨ͠ͱ͖ͷਖ਼نԽఆΛɼ࣍ͷΑ͏ʹۙࣅతʹٻΊΔɽ ɹ i Z(αγy , βγy
) = ∫ (yi | f(xi , W), γy )q(W, γy , γw )dWdγy dγw = ∫ (yi | f(xi , W), γy )q(W, γy )dWdγy ≈ ∫ (yi |z(L), γy )(z(L) |mz(L) , vz(L) )Gam(γy |αγy , βγy )dz(L)dγy = ∫ St(yi |z(L), αγy /βγy ,2αγy )(z(L) |mz(L) , vz(L) )dz(L) ≈ ∫ (yi |mz(L) , (αγy − 1)/βγy )(z(L) |mz(L) , vz(L) )dw = (yi |mz(L) , (αγy − 1)/βγy + vz(L) ) UΛฏۉͱࢄ͕ ͍͠Ψεʹ ۙࣅɽ ͷӅΕϢχοτ ͕ฏۉ ɼ ࢄ ʹै͏ͱԾఆɽ ʢ࣍ͷεϥΠυͰৄ͘͠ʣ l z(l) ∈ ℝHl mz(l) vz(l)
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹ ͷฏۉ ͱࢄ ɼ࠶ؼతͳܭࢉʹΑͬͯۙࣅతʹಘΒΕΔɽ ʲܭࢉํ๏ʳ ɹͷӅΕϢχοτͷ ͕ฏۉ ɼࢄ
Λ࣋ͭͱԾఆ͢Δɽ· ͨɼͷॏΈߦྻ Λ͔͚ͨޙͷϕΫτϧʢ׆ੑʣΛ ͱ͓͘ɽ ͷฏۉͱࢄҎԼͷΑ͏ʹͳΔɽ ͨͩ͠ɼ ͷɼ֤ύϥϝʔλͷฏۉ ͱࢄ Ͱ͋Δɽ· ͨɼ ΞμϚʔϧੵɽ (z(L) |mz(L) , vz(L) ) mz(L) vz(L) l z(l) ∈ ℝHl mz(l) vz(l) l W(l) ∈ ℝHl ×Hl−1 a(l) = W(l)z(l−1)/ Hl−1 a(l) ma(l) = M(l)mz(l−1) / Hl−1 va(l) = {(M(l) ⊙ M(l))vz(l−1) + V(l)(mz(l−1) ⊙ mz(l−1) ) + V(l)vz(l−1) }/Hl−1 M(l), V(l) ∈ ℝHl ×Hl−1 m(l) i,j v(l) i,j ⊙
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹ ͷฏۉ ͱࢄ ɼ࠶ؼతͳܭࢉʹΑͬͯۙࣅతʹಘΒΕΔɽ ʲܭࢉํ๏ʳ ɹͷӅΕϢχοτͷ ͕ฏۉ ɼࢄ
Λ࣋ͭͱԾఆ͢Δɽ· ͨɼͷॏΈߦྻ Λ͔͚ͨޙͷϕΫτϧʢ׆ੑʣΛ ͱ͓͘ɽ ͷฏۉͱࢄҎԼͷΑ͏ʹͳΔɽ ͨͩ͠ɼ ͷɼ֤ύϥϝʔλͷฏۉ ͱࢄ Ͱ͋Δɽ· ͨɼ ΞμϚʔϧੵɽ (z(L) |mz(L) , vz(L) ) mz(L) vz(L) l z(l) ∈ ℝHl mz(l) vz(l) l W(l) ∈ ℝHl ×Hl−1 a(l) = W(l)z(l−1)/ Hl−1 a(l) ma(l) = M(l)mz(l−1) / Hl−1 va(l) = {(M(l) ⊙ M(l))vz(l−1) + V(l)(mz(l−1) ⊙ mz(l−1) ) + V(l)vz(l−1) }/Hl−1 M(l), V(l) ∈ ℝHl ×Hl−1 m(l) i,j v(l) i,j ⊙ ͷӅΕϢχοτͷฏۉ ͱ ࢄ ͔Βͷ׆ੑͷฏۉ ͱࢄ ͕ٻ·Δɽ l − 1 mz(l−1) vz(l−1) l ma(l) va(l)
ظ๏ʹΑΔֶश Ҽࢠͷಋೖ ɹ ͷฏۉ ͱࢄ ɼ࠶ؼతͳܭࢉʹΑͬͯۙࣅతʹಘΒΕΔɽ ʲܭࢉํ๏ʳ ɹͷӅΕϢχοτͷ ͕ฏۉ ɼࢄ
Λ࣋ͭͱԾఆ͢Δɽ· ͨɼͷॏΈߦྻ Λ͔͚ͨޙͷϕΫτϧʢ׆ੑʣΛ ͱ͓͘ɽ ͷฏۉͱࢄҎԼͷΑ͏ʹͳΔɽ ͨͩ͠ɼ ͷɼ֤ύϥϝʔλͷฏۉ ͱࢄ Ͱ͋Δɽ· ͨɼ ΞμϚʔϧੵɽ (z(L) |mz(L) , vz(L) ) mz(L) vz(L) l z(l) ∈ ℝHl mz(l) vz(l) l W(l) ∈ ℝHl ×Hl−1 a(l) = W(l)z(l−1)/ Hl−1 a(l) ma(l) = M(l)mz(l−1) / Hl−1 va(l) = {(M(l) ⊙ M(l))vz(l−1) + V(l)(mz(l−1) ⊙ mz(l−1) ) + V(l)vz(l−1) }/Hl−1 M(l), V(l) ∈ ℝHl ×Hl−1 m(l) i,j v(l) i,j ⊙ ͷӅΕϢχοτͷฏۉ ͱ ࢄ ͔Βͷ׆ੑͷฏۉ ͱࢄ ͕ٻ·Δɽ l − 1 mz(l−1) vz(l−1) l ma(l) va(l) ͷ׆ੑͷฏۉ ͱࢄ ͔Β ͷӅΕϢχοτͷฏۉ ͱࢄ ͕ٻ·Ε࠶ؼతʹܭࢉՄೳɽ l ma(l) va(l) l mz(l) vz(l)
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ׆ੑͷ ɹ׆ੑ ͷ Λܭࢉ͢Δɽத৺ۃݶఆཧΑΓɼӅΕϢχοτ ͕େ͖͍߹ɼ ۙࣅతʹΨεʹै͏ɽ
ɹΨεʹै͏ม͕3F-6Λ௨ΔͱɼਤͷӈਤͷΑ͏ʹͷࠞ߹ʹͳ Δɽ ᶃ ෛͷೖྗΛ௨͖ͬͯͨαϯϓϧɼฏۉ ɼࢄ ͷΑ͏ͳ࣭ʹͳ Δɽ ᶄ ඇෛͷೖྗΛ௨͖ͬͯͨαϯϓϧɼҎԼ͕ΒΕͨஅยΨεʹͳΔɽ a(l) p(a(l) |W(l), z(l−1)) Hl−1 a(l) p(a(l) |W(l), z(l−1)) ≈ q(a(l)) = (a(l) |ma(l) , va(l) ) μp = 0 σp = 0
ظ๏ʹΑΔֶश ׆ੑͷ ʲࠞ߹ͷฏۉͱࢄͷҰൠࣜʳ ɹ ݸͷཁૉΛ࣋ͭࠞ߹ͷฏۉͱࢄɼࠞ߹ ɼ ͱ͢Δͱɼ ҰൠతʹҎԼͷΑ͏ʹͳΔɽ
K πk > 0 K ∑ k=1 πk = 1 [xmix ] = K ∑ k=1 πk μk [xmix ] = K ∑ k=1 πk (μk + σk ) − [xmix ]2
ظ๏ʹΑΔֶश ׆ੑͷ ʲ׆ੑͷࠞ߹ʹద༻ʳɹ ɹɹ࣭ͱஅยΨεͷࠞ߹ΛͦΕͧΕ ɼ ͱ͢Δɽͭ·Γɼ ɽ ɹ ɼ ͱ͓͘ͱɼҎԼͷΑ͏ʹͳΔɽ
ɹ͕ͨͬͯ͠ɼஅΨεͷҎԼͷΑ͏ʹٻΊΒΕΔɽ ɹ<4,PU[ >ΑΓɼஅยΨεͷฏۉ ͱࢄ ҎԼͷΑ͏ʹͳΔɽ ɹҰൠࣜʹ͓͚Δ ɼ ʹͯΊΔͱɼͷฏۉͱࢄ͕ಘΒΕΔɽ πp πt πp + πp = 1 πp ¯ μ = − μ/σ πp = ∫ 0 −∞ (x|μ, σ2)dx = Φ(−μ/σ) = Φ( ¯ μ) πt = 1 − πp = Φ(− ¯ μ) μt σt μt = μ + σ ( ¯ μ|0,1) Φ(− ¯ μ) σ2 t = σ2 {1 + ¯ μ ( ¯ μ|0,1) Φ(− ¯ μ) − ( ¯ μ|0,1) Φ(− ¯ μ) − 2} ( ¯ μ|0,1) Φ(− ¯ μ) [xmix ] [xmix ] z
ظ๏ʹΑΔֶश ׆ੑͷ ͭ·Γɼ ͷ׆ੑͷฏۉͱࢄ͔ΒͷӅΕϢχοτͷฏۉͱࢄ͕ܭࢉՄೳɽ l l ͷฏۉ ͱࢄ ɼ࠶ؼతͳܭࢉʹΑͬͯۙࣅతʹಘΒΕΔɽ
(z(L) |mz(L) , vz(L) ) mz(L) vz(L)
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ޯʹجֶͮ͘श ɹ ɼฏۉ ɼࢄ ͱͯ͠ѻ͏ʢ࠶ؼܭࢉͷॳظ ɼ ʣɽ dͰɼ ͷग़ྗ
͔Β׆ੑ Λ௨͠ɼͷग़ྗ ͷฏۉͱࢄΛٻΊΔʢத৺ۃݶఆཧΑΓΨεʹۙࣅͰ͖ΔɽʣҰ࿈ͷྲྀΕΛ հͨ͠ɽ͜ͷۙࣅ݁ՌΛ࠶ؼతʹ༻͍Δ͜ͱͰɼ࠷ऴ ͷΛΨε Ͱۙࣅ͢Δ͜ͱ͕Ͱ͖Δɽ ɹ͕ͨͬͯ͠ɼਖ਼نԽఆͷۙࣅදݱ͕ಘΒΕΔɽ ɹਖ਼نԽఆΛಘͨޙɼύϥϝʔλʹΑΔඍΛܭࢉ͢Δ͜ͱͰޯ͕ܭࢉͰ͖Δɽ z(0) xi 0 mz(0) vz(0) l − 1 z(l−1) a(l) l z(l) z(L) (z(L) |mz(L) , v(L) z ) Z(αγy , βγy ) ≈ (yi |mz(L) , (αγy − 1)/βγy + vz(L) )
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ֬తٯ๏ͷ·ͱΊ Ϟσϧͷఆٛɿ p(W, γy , γw |) ∝ p(Y|X,
W, γr )p(W|γw )p(γy )p(γw ) ۙࣅͷಋೖɿ q(W, γy , γw ) = q(γy )q(γw )q(W) ۙࣅͷॳظԽɿ q0 (γy ), q0 (γw ), q0 (W) ࣄલҼࢠͷಋೖʢͦͷʣɿ Ҽࢠ ͷՃɿ Ҽࢠ ͷՃɿ p(γr ) q(γr ) ← p(γr ) p(γw ) q(γw ) ← p(γw )
ظ๏ʹΑΔֶश ֬తٯ๏ͷ·ͱΊ ࣄલҼࢠͷಋೖʢͦͷʣɿ for l = 1 to L do
for j = 1 to Hl−1 do for i = 1 to Hl do Ҽࢠp(w(l) i,j |γw )ͷՃɿ ⋅ q(W)ͷߋ৽ ⋅ q(γw )ͷߋ৽ ॱɿ p(yi |xi , W, γy ) where i ∈ s ӅΕϢχοτͱ׆ੑͷฏۉͱࢄΛ࠶ؼܭࢉ Ҽࢠ ͷಋೖɿ ͷߋ৽ p(yi |xi , W, γy ) q(W), q(γy )
ظ๏ʹΑΔֶश ʲظ๏ʹΑΔֶशʳ ‣Ϟσϧ ‣ۙࣅ ‣ॳظԽͱࣄલҼࢠͷಋೖ ‣Ҽࢠͷಋೖ ‣׆ੑͷ ‣ޯʹجֶͮ͘श ‣֬తٯ๏ͷ·ͱΊ ‣ؔ࿈ख๏
ظ๏ʹΑΔֶश ؔ࿈ख๏ ɹ֬తٯ๏ʹࣅͨख๏ͱͯ͠ɼܾఆతมਪ๏͕͋Δɽ ʲมਪ๏ͷܽʳ ɹ&-#0ͷධՁͷͨΊʹରͷظΛܭࢉ͢Δඞཁ͕͋ΓɼϞϯςΧϧϩ๏Ͱۙ ࣅղΛಘ͍ͯΔɽ ҆ఆੑ͕͍ ʲܾఆతมਪ๏ʳ ɹظͷۙࣅܭࢉΛܾఆతʹߦ͏͜ͱͰ҆ఆੑΛߴΊΒΕΔɽ ⟹