Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ベイズ深層学習(6.3)

Avatar for catla catla
March 27, 2020

 ベイズ深層学習(6.3)

ベイズ深層学習 6.3節 生成ネットワークの構造学習

Avatar for catla

catla

March 27, 2020
Tweet

More Decks by catla

Other Decks in Science

Transcript

  1. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹ·ͣɼ ͷόΠφϦߦྻ Λߟ͑ɼ ͱͳΔ৔߹ͷ ͷੜ੒աఔΛߏங͢ Δɽ·ͨɼ֤ཁૉ ͸ϕϧψʔΠ෼෍ ͔Βੜ੒͞ΕΔͱ͢Δɽ͞Β

    ʹϋΠύʔύϥϝʔλ Λ༻͍ͯɼύϥϝʔλ ͕ϕʔλ෼෍  ͔Βੜ੒͞Ε͍ͯΔͱͨ͠Βɼߦྻ ͷ෼෍͸࣍ͷεϥΠυͷΑ͏ʹॻ͚Δɽ N × H M H → ∞ M mn,h ∈ {0,1} Bern(πh ) α > 0,β > 0 πh Beta(αβ/H, β) M     p(πh ) = Beta(αβ/H, β) = Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) π αβ H −1 h (1 − πh )β−1 p(mn,h |πh ) = Bern(πh ) = πmn,h h (1 − πh )1−mn,h πh α β mn,h n = 1,2,…, N h = 1,2,…, H
  2. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ    ͱ͢Δͱ ͱͳΓɼશͯͷόΠφϦߦྻͷੜ੒֬཰͕ʹͳͬͯ͠· ͏ɽ p(M) =

    H ∏ h=1 ∫ p(πh ) { N ∏ n=1 p(mn,h |πh ) } dπh = H ∏ h=1 ∫ Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) π αβ H −1 h (1 − πh )β−1 { N ∏ n=1 πmn,h h (1 − πh )1−mn,h } dπh = H ∏ h=1 Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) ∫ πNh + αβ H −1 h (1 − πh )N−Nh +β−1dπh ( ∵ Nh = N ∑ n=1 mn,h ) = H ∏ h=1 Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) Γ (Nh + αβ H ) Γ(N − Nh + β) Γ ( αβ H + β + N) H → ∞ p(M) → 0 ( ∵ Beta(x, y) = Γ(x)Γ(y) Γ(x + y) < 1 where x > 1,y > 1)
  3. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹੜ੒֬཰͕ʹͳΔ͜ͱΛ๷͙ͨΊʹɼ ͷྻΛฒͼସ͑Δʢlofʣ͜ͱͰಉ͡ʹͳΔ Α͏ͳߦྻͷಉ஋ྨΛ ͱ͓͘ɽ ྫɿ ͷͱ͖ɼ  

    ɹ ʹରͯ͠ ͱͨ͠ͱ͖ͷ෼෍ͷܭࢉ͸จݙ<>ΑΓɼ࣍ͷεϥΠυͷΑ͏ ʹॻ͚Δɽ M [M] M = ( 1 0 0 0 1 0 0 0 1 ) [M] ∈ ( 1 0 0 0 1 0 0 0 1 ) , ( 0 1 0 1 0 0 0 0 1 ) , ( 0 0 1 0 1 0 1 0 0 ) , ( 1 0 0 0 0 1 0 1 0 ) p([M]) H → ∞ [1] “Infinite Latent Feature Models and the Indian Buffet Process”, 2018
  4. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹɹɹɹɹɹɹ  ͨͩ͠ɼ ͸ ʹ͋ΔόΠφϦྻʢྻϕΫτϧ͕ಉ͡ͳΒ΋ಉ͡ʣͷݸ਺Ͱɼಉ ͡όΠφϦྻͷฒͼସ͑ʹΑΔॏෳΛΩϟϯηϧ͢ΔͨΊʹׂΔɽ·ͨɼ ͸ 

    ͱͳΔΑ͏ͳྻ ͷݸ਺ɽ ͸ ͷظ଴஋ɽɹɹɹɹɹɹɹ ɹ͜ͷ෼෍͸ ͷߦΛަ׵ͯ͠΋มΘΒͳ͍ͷͰɹަ׵ՄೳੑɹΛ࣋ͭɽ p([M]) = ∑ M∈[M] p(M) = H! ∏ i≥1 Hi ! H ∏ h=1 Γ ( αβ H + β) Γ ( αβ H ) + Γ(β) Γ (Nh + αβ H ) Γ(N − Nh + β) Γ ( αβ H + β + N) → (αβ)H+ ∏ i≥1 Hi ! exp(− ¯ H+ ) H+ ∏ h=1 Γ (Nh) Γ(N − Nh + β) Γ (β + N) Hi M i i i H+ Nh > 0 h ¯ H+ = α N ∑ n=1 β n + β − 1 H+ M
  5. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ ɹੜ੒͞ΕΔόΠφϦߦྻ ͷಛੑ w٬ਓ͋ͨΓͷྉཧͷ਺͸ ʹै͏ɽ wͱΒΕΔྉཧͷ૯਺ͷظ଴஋͸  w٬ʹऔΒΕΔྉཧͷछྨͷ߹ܭ͸ 

    w  શһ͕ಉ͡ྉཧΛબͿ  w  ٬ಉ͕࢜ಉ͡ྉཧΛબ͹ͳ͘ͳΔ M ∈ {0,1}N×∞ Poi(α) Nα ¯ H+ = α N ∑ n=1 β n + β − 1 lim β→0 ¯ H+ = α lim β→∞ ¯ H+ = Nα
  6. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ      O Poi(α) Poi

    ( αβ 2 + β − 1) Poi ( αβ 3 + β − 1) Poi ( αβ 4 + β − 1 ) Poi ( αβ 5 + β − 1) Nh n + β − 1 Nh n + β − 1 Nh n + β − 1 Nh n + β − 1 ग़యɿ8JLJQFEJB ϙΞιϯ෼෍
  7. Πϯυྉཧաఔ ʲແݶߦྻͷੜ੒ʳ      O Nh n

    + β − 1  ͳͷͰɼ ֬཰ ͰʹͳΔɽ N2 = 3 3 4 + β − 1 h = 1 h = 2 h = 3
  8. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹσʔλ Λແݶ࣍ݩͷજࡏม਺ ͱόΠφϦߦྻ ʹΑͬͯɼੜ੒ϞσϧΛϞσϧԽ ͢ΔͱҎԼͷΑ͏ʹͳΔɽ   ɹ

    ͕ ͷΑ͏ʹղੳతʹੵ෼আڈͰ͖ΔͱԾఆͨ͠৔߹ɼ ΪϒεαϯϓϦϯάʹΑͬͯࣄޙ෼෍ ͔Β֤ Λ࣍ͷΑ͏ʹαϯϓϦϯάͰ ͖Δɽ   ɹ ͔Β ͕αϯϓϦϯά͞ΕΔ֬཰͸ɼΠϯυྉཧաఔʹ͓͍ͯ  ਓ͕ྉཧΛऔͬͨޙʹ࠷ޙͷ ൪໨ͷ٬͕ ൪໨ͷྉཧΛͱΔ͜ͱʹରԠ͍ͯ͠ Δɽ X θ M p(X, M, θ) = p(X|M, θ)p(M)p(θ) θ p(X|M) = ∫ p(X|M, θ)p(θ)dθ p(M|X) mn,h p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) ) p(mn,h |M\(n,h) ) mn,h = 1 n − 1 n h
  9. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹ͕ͨͬͯ͠ɼ ͱͳΔ͋Δ ݸ໨ͷ஋ ͸ɼΪϒεαϯϓϦϯά Λ༻͍ͯɼ֬཰ ͱ໬౓ Λܭࢉ͢Δ͜ͱʹΑΓαϯϓϦϯάͰ͖Δɽ ಉ༷ʹ

    ͱͳΔΑ͏ͳ৽نόΠφϦྻͷੜ੒͸ɼ৽نʹੜ੒͞ΕΔྻͷ਺  ͷ֬཰͕ ͱ໬౓ ʹΑΓܭࢉͰ͖Δɽ৽نʹ௥Ճ͞ΕΔྻ਺͸Ճ ࢉແݶݸଘࡏ͢ΔͨΊɼݫີʹ͸ ΛແݶճධՁ͢Δඞཁ͕͋Δ͕ɼۙࣅͯ͠༗ ݶͷީิ਺ͰܭࢉΛଧͪ੾Δํ๏͕࢖ΘΕΔɽʢྫɿ ͷΑ͏ʹଧͪ੾Δɽʣ N\n,h = ∑ n′≠n mn′,h > 0 h mn,h N\n,h n + β − 1 p(X|M) N\n,h = 0 Hnew Poi( αβ N + β − 1 ) p(X|M) p(X|M) Hnew ≤ 10 p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) )
  10. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹ͕ͨͬͯ͠ɼ ͱͳΔ͋Δ ݸ໨ͷ஋ ͸ɼΪϒεαϯϓϦϯά Λ༻͍ͯɼ֬཰ ͱ໬౓ Λܭࢉ͢Δ͜ͱʹΑΓαϯϓϦϯάͰ͖Δɽ ಉ༷ʹ

    ͱͳΔΑ͏ͳ৽نόΠφϦྻͷੜ੒͸ɼ৽نʹੜ੒͞ΕΔྻͷ਺  ͷ֬཰͕ ͱ໬౓ ʹΑΓܭࢉͰ͖Δɽ৽نʹ௥Ճ͞ΕΔྻ਺͸Ճ ࢉແݶݸଘࡏ͢ΔͨΊɼݫີʹ͸ ΛແݶճධՁ͢Δඞཁ͕͋Δ͕ɼۙࣅͯ͠༗ ݶͷީิ਺ͰܭࢉΛଧͪ੾Δํ๏͕࢖ΘΕΔɽʢྫɿ ͷΑ͏ʹଧͪ੾Δɽʣ N\n,h = ∑ n′≠n mn′,h > 0 h mn,h N\n,h n + β − 1 p(X|M) N\n,h = 0 Hnew Poi( αβ N + β − 1 ) p(X|M) p(X|M) Hnew ≤ 10 p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) )
  11. Πϯυྉཧաఔ ʲΪϒεαϯϓϦϯάʳ ɹ͕ͨͬͯ͠ɼ ͱͳΔ͋Δ ݸ໨ͷ஋ ͸ɼΪϒεαϯϓϦϯά Λ༻͍ͯɼ֬཰ ͱ໬౓ Λܭࢉ͢Δ͜ͱʹΑΓαϯϓϦϯάͰ͖Δɽ ಉ༷ʹ

    ͱͳΔΑ͏ͳ৽نόΠφϦྻͷੜ੒͸ɼ৽نʹੜ੒͞ΕΔྻͷ਺  ͷ֬཰͕ ͱ໬౓ ʹΑΓܭࢉͰ͖Δɽ৽نʹ௥Ճ͞ΕΔྻ਺͸Ճ ࢉແݶݸଘࡏ͢ΔͨΊɼݫີʹ͸ ΛແݶճධՁ͢Δඞཁ͕͋Δ͕ɼۙࣅͯ͠༗ ݶͷީิ਺ͰܭࢉΛଧͪ੾Δํ๏͕࢖ΘΕΔɽʢྫɿ ͷΑ͏ʹଧͪ੾Δɽʣ N\n,h = ∑ n′≠n mn′,h > 0 h mn,h N\n,h n + β − 1 p(X|M) N\n,h = 0 Hnew Poi( αβ N + β − 1 ) p(X|M) p(X|M) Hnew ≤ 10  ൪໨ʹདྷͨ٬͕ ൪໨ ͷྉཧΛऔΔ֬཰ n h  ൪໨ʹདྷͨ٬͕৽͍͠ ྉཧΛऔΔ֬཰ n p(mn,h = 1|M\(n,h) , X) ∝ p(X|M)p(mn,h = 1|M\(n,h) )
  12. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ɹจݙ<>ʹج͖ͮɼඇઢܗΨ΢ε৴೦ωοτϫʔΫʢOPOMJOFBS(BVTTJBOCFMJFG OFUXPSLʣͱ͍͏ੜ੒ϞσϧΛ࢖ͬͯ%//Λߏ੒͢Δɽ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹ ૚ͷωοτϫʔΫΛߟ͑Δɽ ɹ ɿ૚໨ͷϢχοτ਺ɽ ɹ ɿ૚໨ͷ

    ൪໨ͷϢχοτɽ ɹ ɿ ྡ઀ߦྻʢʹશ݁߹૚ʹର͢ΔϚεΫʣɽཁૉ ͸ɼ ͔Β ʹ໼ҹ͕ଘࡏ͢Δ͜ͱ Λҙຯ͢Δɽ ɹ ɿ૚໨ͷॏΈύϥϝʔλɽ ɹ ɿ૚໨ͷόΠΞεύϥϝʔλɽ ɹ ɿ૚໨ͷ׆ੑɽ ͱ͓͘ɽ L Hl l z(l) h l h M(l) ∈ ℝHl−1 ×Hl m(l) h,h′ = 1 z(l) h′ z(l−1) h W(l) ∈ ℝHl−1 ×Hl l b ∈ ℝHl l a(l) l [2] “Learning the Structure of Deep Sparse Graphical Models”, 2010
  13. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹ׆ੑ͸ҎԼͷΑ͏ʹॻ͚Δɽ   ͞Βʹ׆ੑ ʹ͸Ψ΢ε෼෍͔ΒͷϊΠζ͕ՃΘΔͱ͢Δɽ   ӅΕϢχοτ

    ͸ҎԼͷΑ͏ʹม׵͞Ε͍ͯΔͱ͢Δɽ   a(l) = (W(l+1) ⊙ M(l+1))z(l+1) + b(l) a(l) h ˜ a(l) h = a(l) h + ϵ ϵ ∼ (0,ν(l) h ) z(l) h z(l) h = ϕ( ˜ a(l) h ) ϕ( ⋅ ) = Tanh( ⋅ )
  14. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲඇઢܗΨ΢ε৴೦ωοτϫʔΫʳ ɹ͢Δͱɼ ͷ෼෍͸ҎԼͷΑ͏ʹٻΊΒΕΔɽʢ֬཰ม਺ͷม਺ม׵ʣ     ɹ·ͨɼ ͱ

    ͸Ψ΢εࣄલ෼෍ɼ ʹ͸ΨϯϚࣄલ෼෍Λ༩͑Δɽʢڞ໾ࣄલ ෼෍ʣ w  ͕খ͍͞ɹˠ஋ͷۃ୺ͳ஋ΛͱΔɽ w  ͕େ͖͍ɹˠ΄΅ܾఆతͳ஋ΛͱΔɽ ͞Βʹɼ ͱ͢Δɽ z(l) h p(z(l) h |a(l) h , ν(l) h ) = (ϕ−1(z(l) h )|a(l) h , ν(l) h ) ∂ ˜ a(l) h ∂z(l) h = N(ϕ−1(z(l) h )|a(l) h , ν(l) h ) ϕ′(ϕ−1(z(l) h )) ϕ′(a) = d da ϕ(a) W(l) h b(l) ν(l) h ν(l) h ν(l) h z(0) h = xh ∈ (−1,1)
  15. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲ௚ྻΠϯυྉཧաఔʳ ɹ͖ͬ͞ͷಉ࣌෼෍ʹؔͯ͠ɼ૚਺΍ӅΕϢχοτ਺ʹࣗ༝౓Λ࣋ͨͤΔΑ͏ʹ֦ு͢ Δɽ ɹ·ͣɼ Λߟ͑Δɽߦ਺͸ ͷ࣍ݩ਺ ͰݻఆͳͷͰɼΠϯυྉཧաఔͷ٬਺ͱΈ ͳ͢ɽ࣍ʹ Λߟ͑Δͱɼߦ਺͸ΠϯυྉཧաఔʹΑͬͯಘΒΕͨ

    ͷྻ਺ ʹ ݻఆ͢Δඞཁ͕͋Γɼ͜Ε΋ΠϯυྉཧաఔͰαϯϓϦϯάͰ͖Δɽ͜ΕΛ܁Γฦ࣮͠ ߦͯ͠ɼ ɼ ɼ ɼ ͷΑ͏ʹαϯϓϦϯά͢Δ͜ͱͰωοτϫʔΫΛߏஙͰ ͖Δɽ M(1) x H0 M(2) M(1) H1 M(1) M(2) M(3) …  x ∈ ℝH0  M(1) ∈ ℝH0 ×H1  M(2) ∈ ℝH1 ×H2  M(3) ∈ ℝH2 ×H3 IBP IBP IBP IBP
  16. ແݶͷχϡʔϥϧωοτϫʔΫϞσϧ ʲ௚ྻΠϯυྉཧաఔʳ 2ɽͲͷΑ͏ʹಉ࣌෼෍Λਪ࿦͢Δ͔ɽ "ɹ.$.$Ͱۙࣅతʹਪ࿦Ͱ͖Δɽ ʢ۩ମྫʣӅΕϢχοτͷू߹ ɼόΠφϦߦྻͷू߹ ɼύϥϝʔλͷू߹  ͷͭͷϒϩοΫʹ෼͚ͯɼΪϒεαϯϓϦϯάʹجͮ͘ަޓαϯϓϦϯάΛߦ͑Δɽ Z

    M {W, b, ν} Z ∼ p(Z|X, M, W, b, ν) M ∼ p(M|X, Z, W, b, ν) W, b, ν ∼ p(W, b, ν|X, Z, M) zn ∼ p(zn |xn , M, W, b, ν) mn,h ∼ p(mn,h |X, Z, W, b, ν, M\(n,h) ) W ∼ p(W|X, Z, M, b, ν) b ∼ p(W, b, ν|X, Z, M, W, ν) ν ∼ p(W, b, ν|X, Z, M, W, b)