Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BDA3 17章

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
370

BDA3 17章

某輪読会で発表した際の資料.Gelmanら『Bayesian Data Analysis 3rd. Edition』17章

Avatar for Takahiro Kawashima

Takahiro Kawashima

May 29, 2018
Tweet

More Decks by Takahiro Kawashima

Transcript

  1. ໨࣍ 1. Aspects of robustness 2. Overdispersed versions of standard

    models 3. Posterior inference and computation 4. Robust inference for the eight schools 5. Robust regression using t-distributed errors 2
  2. Aspects of robustness ෮श: Eight Schools • SAT-V ͱ͍͏ςετʹର͢ΔಛผิशͷޮՌΛղੳ •

    8 ߍ͕ࢀՃ • 5 ষͰ͸ʮֶߍ͝ͱͷฏۉิशޮՌ͸ಉఔ౓ͩΖ͏ʯͱ͍͏ ղੳ݁Ռʹ 4
  3. Aspects of robustness 5 ষͷ Eight Schools ͷ֊૚Ϟσϧ • θj

    ∼ N(θj|µ, τ2) • yj ∼ N(yj|θj, σ2) • ؍ଌͷ෼ࢄ σ2 ͸ֶ֤ߍͰڞ௨ 5
  4. Aspects of robustness 5 ষͷ Eight Schools ͷ֊૚Ϟσϧ ֶ֤ߍͷิशޮՌͷ୯७ͳฏۉͱ෼ࢄ School

    ¯ y·j σj A 28 15 B 8 10 C -3 16 D 7 11 E -1 9 F 1 11 G 18 10 H 12 18 6
  5. Aspects of robustness ¯ y·8 = 100 ͱஔ͖׵͑ͯΈΔ ֶ֤ߍͷิशޮՌͷ୯७ͳฏۉͱ෼ࢄ School

    ¯ y·j σj A 28 15 B 8 10 C -3 16 D 7 11 E -1 9 F 1 11 G 18 10 H 100 18 7
  6. Aspects of robustness ¯ y·8 = 100 ͱஔ͖׵͑ͯΈΔ • τ

    ͕େ͖͍஋ʹͳΔ (θj ͷ෼ࢄ͕େ͖͘ͳΔ) • θj ͷਪఆ஋ ˆ θj ͕΄΅؍ଌͷӨڹͷΈʹͳΔ (ࣜ 5.17) ˆ θj = 1 σ2 j ¯ y·j + 1 τ2 µ 1 σ2 j + 1 τ2 ֎Ε஋ ¯ y·8 = 100 ͷਪఆྔ΁ͷӨڹ͸খ͍ͨ͘͞͠ ˠͦ͢ͷ௕͍෼෍Λ༻͍Δ 8
  7. Aspects of robustness ͦ͢ͷ௕͍෼෍ͷݕ౼ 1. Student ͷ t ෼෍ (ࣗ༝౓

    ν = 1 Ͱ Cauchy ෼෍ɼν → ∞ Ͱਖ਼ ن෼෍) 2. ޿͕Γͷେ͖͍෼෍ͱͷࠞ߹෼෍Ϟσϧ t ෼෍ͷࣗ༝౓ ν Λେ͖͍΋ͷ͔Βখ͍͞΋ͷʹม͑ͳ͕Β৭ʑ ࢼͯ͠ΈΔͷ͕Α͍ ˠ importance resampling(17.4 અ) 9
  8. Aspects of robustness ࢀߟ: Student ͷ t ෼෍ x -3

    -2 -1 0 1 2 3 ν=1(Cauchy) ν=4 ν=10 ν→∞(Gauss) Distributions 0.0 0.1 0.2 0.3 0.4 y 10
  9. Overdispersed versions of standard models ਖ਼ن෼෍ϞσϦϯάͷݶք தԝ஋͔Β࢛෼Ґൣғͷ 1.5 ഒͷڑ཭ΑΓԕ ͍ͱ͜Ζʹ

    10%Ҏ্ͷσʔλ͕͋ΔͳΒɼ ਖ਼ن෼෍ͰͷϞσϦϯά͸ෆద੾ (Πϝʔδ: ӈͷശͻ͛ਤͰ໢ֻ͚෦ʹ 10%Ҏ্) ελϯμʔυͳϞσϧͷؤ݈ͳ୅ସΛߟ͑Δ • աখ෼ࢄʹରԠͰ͖ͳ͍͜ͱʹ஫ҙ 11
  10. Overdispersed versions of standard models ਖ਼ن෼෍ͷ୅ସͱͯ͠ͷ t ෼෍ • ͜Μͳͱ͖ʹ࢖͏Α

    1. ͨ·ʹҟৗͳσʔλ͕Ͱͯ͘Δ 2. ͨ·ʹۃ୺ͳ஋Λڐ༰͍ͨ͠ύϥϝʔλͷࣄલ෼෍ • ࣗ༝౓ ν = 1 Ͱ Cauchy ෼෍ɼν → ∞ Ͱਖ਼ن෼෍ • σʔλ͕ͨ͘͞Μ͋ΔͳΒ ν ΋ະ஌ύϥϝʔλʹ • ν ͕খ͍͞஋ΛͱΒͳ͍Α͏ʹ௒ࣄલ෼෍Λઃఆ͢Δͷ΋Φ εεϝ ˠ ν = 1, 2 ͷͱ͖͸෼ࢄ͕ଘࡏͤͣɼѻ͍ͮΒ͍ͨΊ 12
  11. Overdispersed versions of standard models ਖ਼ن෼෍ͷ୅ସͱͯ͠ͷ t ෼෍ tν(yi|µ, σ2)

    = ∫ ∞ 0 N(yi|µ, Vi)Inv-χ2(Vi|ν, σ2)dVi t ෼෍͸ 1. ݻఆͷฏۉ µ ͱ 2. ई౓෇͖ٯΧΠೋ৐෼෍ʹ͕ͨ͠͏෼ࢄ Vi Ͱද͞ΕΔਖ਼ن෼෍ͷࠞ߹෼෍ͱղऍͰ͖Δ 13
  12. Overdispersed versions of standard models ϙΞιϯ෼෍ͷ୅ସͱͯ͠ͷෛͷೋ߲෼෍ ϙΞιϯ෼෍ Poisson(yi|λ) = λyi

    yi! exp(−λ) • mean(y) = λ, var(y) = λ • ฏۉ஋ͱ෼ࢄΛ෼཭͍ͨ͠ ˠෛͷೋ߲෼෍ 14
  13. Overdispersed versions of standard models ϙΞιϯ෼෍ͷ୅ସͱͯ͠ͷෛͷೋ߲෼෍ ෛͷೋ߲෼෍ (p.578) NegBin(yi|α, β)

    = ( yi + α − 1 α − 1 )( β β + 1 )α ( 1 β + 1 )yi κ := 1 α , λ := α β ͱ͓͘ͱ NegBin(yi|κ, λ) = λyi yi! Γ(1/κ + yi) Γ(1/κ)(1/κ + λ)yi (1 + κλ)−1/κ ͱมܗͰ͖Δ • mean(y) = λ, var(y) = λ(1 + κλ) ˠ͏Ε͍͠ 15
  14. Overdispersed versions of standard models ϙΞιϯ෼෍ͷ୅ସͱͯ͠ͷෛͷೋ߲෼෍ ͪͳΈʹ NegBin(yi|r, p) =

    Γ(r + yi) yi!Γ(r) (1 − p)rpyi ͱ͍͏දهΛ͢Δͱɼ NegBin(yi|a, 1 b + 1 ) = ∫ Poisson(yi|λ)Gamma(λ|a, b)dλ ͱͳΔ (ਢࢁ྘ຊ p.86, 87) 16
  15. Overdispersed versions of standard models ೋ߲෼෍ͷ୅ସͱͯ͠ͷ Beta-Binomial ෼෍ ೋ߲෼෍ Binomial(yi|n,

    p) = ( n yi ) pyi (1 − p)n−yi • mean(y) = np, var(y) = np(1 − p) • 1 − p ͷൣғ͕ [0, 1] ͷͨΊɼ෼ࢄ͕ฏۉΑΓେ͖͘ͳΒͳ͍ ͜ͱ͕Θ͔Δ ˠ Beta-Binomial ෼෍ 17
  16. Overdispersed versions of standard models ೋ߲෼෍ͷ୅ସͱͯ͠ͷ Beta-Binomial ෼෍ • πj

    ∼ Beta(πj|α, β) • yj ∼ Bin(yj|m, πj) • ؍ଌͷճ਺ m ͸֤ j Ͱڞ௨ 18
  17. Overdispersed versions of standard models ೋ߲෼෍ͷ୅ସͱͯ͠ͷ Beta-Binomial ෼෍ Beta-Binomial ෼෍

    p(yj|n, p) = Bin(yj|m, πj)Beta(πj|α, β) • mean(y) = n α α + β • var(y) = n αβ(α + β + m) (α + β)2(α + β + 1) • var(y) mean(y) = β(α + β + m) (α + β)(α + β + 1) ˠฏۉͱ෼ࢄΛ෼཭Ͱ͖ͨ 19
  18. Overdispersed versions of standard models ೋ߲෼෍ͷ୅ସͱͯ͠ͷ Beta-Binomial ෼෍ • n

    = 100, m = 30 • mean(y) = n α α + β • var(y) = n αβ(α + β + m) (α + β)2(α + β + 1) Ћ Ќ mean variance 0.1 0.1 50.0 629.2 1.0 1.0 50.0 266.7 10.0 10.0 50.0 59.5 100.0 100.0 50.0 28.6 1000.0 1000.0 50.0 25.4 20
  19. Posterior inference and computation ه๏ͳͲ p0(θ|y): طʹσʔλ͔Βਪଌͨ͠ඇϩόετͳϞσϧͷࣄޙ෼෍ ϕ: ϩόετੑʹؔΘΔϋΠύʔύϥϝʔλ (t

    ෼෍ͷࣗ༝౓ͳͲ) ҎԼ͔ΒͷαϯϓϦϯά͕໨త (ࣜ 17.2) p(θ|ϕ, y) = p(y|θ, ϕ)p(θ|ϕ) p(y|ϕ) ∝ p(y|θ, ϕ)p(θ|ϕ) ϕ ͕ൣғͱͯ͠༩͑ΒΕΔ৔߹͸ p(ϕ|y) ΋ܭࢉ p(θ, ϕ|y) = p(θ|ϕ, y)p(ϕ|y) ∝ p(y|θ, ϕ)p(θ|ϕ)p(θ|ϕ) 22
  20. Posterior inference and computation ΪϒεαϯϓϦϯάͷྫ y = (y1, . .

    . , yn) Λ yi ∼ tν(yi|µ, σ2) ͰϞσϦϯά tν(yi|µ, σ2) = ∫ ∞ 0 N(yi|µ, Vi)Inv-χ2(Vi|ν, σ2)dVi ࣄલ෼෍͸؆୯ͷͨΊ p(µ), p(logσ) Ұఆ p(µ, σ2, V |ν, y) ͷ݁߹ࣄޙ෼෍͔Β ν, σ2 ʹ͍ͭͯͷࣄޙ෼෍Λ ߟ͑Δ 24
  21. Posterior inference and computation ΪϒεαϯϓϦϯάͷྫ ҎԼͷαϯϓϦϯάΛ܁Γฦ͢ (12.1 અ) 1. Vi|µ,

    σ2, ν, y ∼ Inv-χ2 ( Vi ν + 1, νσ2 + (yi − µ)2 ν + 1 ) 2. µ|σ2, V , ν, y ∼ N      µ ∑ n i=1 1 Vi yi ∑ n i=1 1 Vi , 1 ∑ n i=1 1 Vi      3. p(σ2|µ, V , ν, y) ∝ Gamma ( σ2 nν 2 , ν 2 ∑ n i=1 1 Vi ) 25
  22. Posterior inference and computation ΪϒεαϯϓϦϯάͷྫ • ν ͕ະ஌ͳΒ ν Λಈ͔͢εςοϓΛՃ֦͑ͯு

    ˠΪϒεαϯϓϦϯάͰ͸೉͍͠ͷͰϝτϩϙϦε๏ͳͲ • ϚϧνϞʔμϧʹͳΔͷͰ simulated tempering(ϨϓϦΧަ ׵ MCMC) Λ࢖͓͏ 26
  23. Posterior inference and computation ࣄޙ༧ଌ෼෍͔ΒͷαϯϓϦϯά 1. ࣄޙ෼෍ θ ∼ p(θ|ϕ,

    y) ͔ΒαϯϓϦϯά 2. ༧ଌ෼෍ ˜ y ∼ p(˜ y|θ, ϕ) ͔ΒαϯϓϦϯά 27
  24. Posterior inference and computation importance weighting ʹΑΔϋΠύʔύϥϝʔλͷपลࣄޙ෼෍ ͷܭࢉ • ϩόετͳΒ֎Ε஋͕͋ͬͯ΋पลࣄޙ෼෍

    p(ϕ|y) ΁ͷӨ ڹ͸খ͍͞͸ͣ (ʁ) • पลࣄޙ෼෍ͷܭࢉʹΑͬͯײ౓ͷղੳ͕Մೳ p0(θ|y) ͔ΒͷαϯϓϦϯάΛطʹ࣮ߦࡁΈ • θs, s = 1, . . . , S ϩόετϞσϧ p(ϕ|y) ͷपลࣄޙ෼෍Λۙࣅ 28
  25. Posterior inference and computation importance weighting (13.11) ࣜΑΓ p(ϕ|y) ∝

    p(ϕ)p(y|ϕ) = p(ϕ) ∫ p(y, θ|ϕ)dθ = p(ϕ) ∫ p(y|θ, ϕ)p(θ|ϕ)dθ = p(ϕ) ∫ p(y|θ, ϕ)p(θ|ϕ) p0(θ)p0(y|θ) p0(θ)p0(y|θ)dθ ∝ p(ϕ) ∫ p(y|θ, ϕ)p(θ|ϕ) p0(θ)p0(y|θ) p0(θ|y)dθ ≈ p(ϕ) [ 1 S S ∑ s=1 p(y|θs, ϕ)p(θs|ϕ) p0(θs)p0(y|θs) ] (17.3) 29
  26. Posterior inference and computation importance weighting (17.3) ࣜ p(ϕ) [

    1 S S ∑ s=1 p(y|θs, ϕ)p(θs|ϕ) p0(θs)p0(y|θs) ] ͸ٻ·Δ ˠϩόετϞσϧͷύϥϝʔλ ϕ ͷࣄޙ෼෍͕ɼ΋ͱͷඇϩόε τͳϞσϧ͔ΒͷαϯϓϦϯά݁ՌΛ༻͍ͯۙࣅతʹಘΒΕΔ 30
  27. Posterior inference and computation importance resampling ͰϩόετϞσϧͷۙࣅࣄޙ෼෍ΛಘΔ p0(θ|y) ͔ΒͷαϯϓϦϯά݁ՌΛطʹܭࢉࡁΈ •

    θs, s = 1, . . . , S • ͨ͘͞ΜαϯϓϦϯά͓ͯ͘͠ɽS = 5000 ͱ͔ θ ͔Β importance ratio p(θs|ϕ, y) p0(θs|y) = p(θs|ϕ)p(y|θs, ϕ) p0(θs)p0(y|θs) ʹൺྫ͢Δ֬཰Ͱখ͍͞αϒαϯϓϧ (k = 500 ͱ͔) Λੜ੒͢Δ 31
  28. Posterior inference and computation importance resampling(10.4 અ) importance ratio ͕ඇৗʹେ͖͍αϯϓϧ͕গ਺ଘࡏ͢Δঢ়گ

    ˠ෮ݩநग़Ͱ͸ۙࣅࣄޙ෼෍ͷදݱྗ͕ஶ͘͠௿Լ͢Δ͓ͦΕ ˠ෮ݩநग़ͱඇ෮ݩநग़ͷதؒతͳखଓ͖ΛͱΔ 32
  29. Posterior inference and computation importance resampling(10.4 અ) 1. importance ratio

    ʹൺྫ͢Δ֬཰Ͱ θs ͔Βͻͱͭநग़͠ɼα ϒαϯϓϧʹՃ͑Δ 2. ಉ༷ʹαϒαϯϓϧʹՃ͑Δɽ͜͜Ͱ͸طʹநग़ͨ͠αϯϓ ϧ͸બ୒͠ͳ͍ 3. 1., 2. Λαϒαϯϓϧͷݸ਺͕ k ݸʹͳΔ·Ͱ܁Γฦ͢ ΋ͱͷαϯϓϧ਺͕ແݶେͰαϒαϯϓϧ͕े෼খ͚͞Ε͹ཧ࿦ తͳਖ਼౰ੑ͸อূ͞Εͦ͏͕ͩʜʜ 33
  30. Robust inference for the eight schools Eight Schools 5 ষͷ

    Eight Schools ͷ֊૚Ϟσϧ • θj ∼ N(θj|µ, τ2) • yj ∼ N(yj|θj, σ2) • ؍ଌͷ෼ࢄ σ2 ͸ֶ֤ߍͰڞ௨ 34
  31. Robust inference for the eight schools Eight Schools θj ͷ௒ࣄલ෼෍ʹ

    Student ͷ t ෼෍Λ༻͍Δ • θj ∼ tν(θj|µ, τ2) • yj ∼ N(yj|θj, σ2) ࣄޙ෼෍͸ p(θj, µ, τ|ν, yj) ∝ p(yj|θj, µ, τ, ν)p(θj, µ, τ|ν). 5 ষͷϞσϧ͸ p(θj, µ, τ|ν, yj) = p(θj, µ, τ|ν → ∞, yj). 35
  32. Robust inference for the eight schools t4 ෼෍ʹΑΔϩόετਪఆ • ද

    5.2 • ୯७ͳֶ֤ߍ͝ͱͷฏۉͱ෼ࢄ • ¯ y·1 ͕֎Ε஋ School ¯ y·j σj A 28 15 B 8 10 C -3 16 D 7 11 E -1 9 F 1 11 G 18 10 H 12 18 36
  33. Robust inference for the eight schools t4 ෼෍ʹΑΔϩόετਪఆ ද 5.3(ਖ਼ن෼෍ϞσϦϯάͰͷิशޮՌͷਪఆ݁Ռ)

    School posterior quantiles 2.5% 25% 50% 75% 97.5% A -2 7 10 16 31 B -5 3 8 12 23 C -11 2 7 11 19 D -7 4 8 11 21 E -9 1 5 10 18 F -7 2 6 10 28 G -1 7 10 15 26 H -6 3 8 13 33 37
  34. Robust inference for the eight schools t4 ෼෍ʹΑΔϩόετਪఆ ࣗ༝౓Λ ν

    = 4 ʹݻఆͯ͠ΪϒεαϯϓϦϯά ද 17.1(t ෼෍ϞσϦϯάͰͷิशޮՌͷਪఆ݁Ռ) School posterior quantiles 2.5% 25% 50% 75% 97.5% A -2 6 11 16 34 B -5 4 8 12 21 C -14 2 7 11 21 D -6 4 8 12 21 E -9 1 6 9 17 F -9 3 7 10 19 G -1 6 10 15 26 H -8 4 8 13 26 38
  35. Robust inference for the eight schools t4 ෼෍ʹΑΔϩόετਪఆ ࣗ༝౓Λ ν

    = 4 ʹݻఆͯ͠ΪϒεαϯϓϦϯά ˠ A ͳͲͷ shrinkage ͕গ͚ͩ͠·͠ʹͳ͍ͬͯΔ (͋΍͍͠) School posterior quantiles 2.5% 25% 50% 75% 97.5% A(ਖ਼ن෼෍Ϟσϧ) -2 7 10 16 31 A(ϩόετϞσϧ) -2 6 11 16 34 39
  36. Robust inference for the eight schools importance resampling ͷܭࢉ p0(θ,

    µ, τ|y) ͔Β S = 5000 ճαϯϓϦϯά ν = 4 ͱ͠ɼ֤ θ ͷ importance ratio p(θ, µ, τ|y) p0(θ, µ, τ|y) = p(µ, τ|ν)p(θ|µ, τ, ν)p(y|θ, µ, τ, ν) p(y) p0(µ, τ)p0(θ|µ, τ)p0(y|θ, µ, τ) p0(y) ∝ p(θ|µ, τ, ν) p0(θ|µ, τ) = 8 ∏ j=1 tν(θj|µ, τ) N(θj|µ, τ) (17.5) ʹج͍ͮͯ k = 500 ͷαϒαϯϓϧΛੜ੒ 40
  37. Robust inference for the eight schools importance resampling ͷܭࢉ •

    ࠓճͷϩόετੑͷධՁͰ͸े෼ͳۙࣅ • ߴਫ਼౓ͳਪ࿦͕ඞཁͳ৔߹ͩͱ͖ͼ͍͠ ˠର਺ importance ratio ͷ෼෍ͷ͕͓͔ͦ͢͠ͳ͜ͱʹͳͬͨͦ͏ t4 ෼෍ͱਖ਼ن෼෍ͱͷͦ͢ͷߴ͞ͷࠩͷӨڹʁ 41
  38. Robust inference for the eight schools ν ͷมԽʹ൐͏Ϟσϧͷײ౓ͷղੳ (ਤ 17.1)

    • ν = 1, 2, 3(, 4), 5, 10, 30(, ∞) Ͱಉ༷ͷ MCMC γϛϡϨʔ γϣϯΛճͨ݁͠Ռ • ν ͷมԽʹΑΔ໌Β͔ͳޮՌ͸ͳͦ͞͏ • ν → ∞ ΑΓ ν = 30 ͷ΄͏͕ϩόετੑ͕௿ͦ͏ͳͷ͸ ͳͥʁ • ν = 30 Ͱฏۉ΋ඪ४ภࠩ΋ॖখ͍ͯ͠Δͷ͸ͳͥʁ 42
  39. Robust inference for the eight schools ν Λະ஌ύϥϝʔλͱͯ͠ѻ͏ ਤ 17.1

    ΛݟΔݶΓɼਪఆ஋ͷ ν ΁ͷґଘੑ͸௿ͦ͏ ˠपลࣄޙ෼෍ʹΑΔղੳ (importance weighting) ͸ෆཁ 1 ν ∈ (0, 1] ʹ͍ͭͯҰఆͷࣄલ෼෍Λઃఆ ৚݅෇͖ࣄલ෼෍ p(µ, τ|ν) ∝ 1 ͕ improper ͳͷͰ p(µ, τ|ν) ∝ g(ν) ͱࣄલ෼෍ͷ ν ґଘੑΛදه ν ≤ 2 Ͱෆ౎߹ͳͷͰ µ, τ ΛͦΕͧΕதԝ஋ͱ࢛෼Ґ۠ؒͱͯ͠ ѻ͍ɼg(ν) ∝ 1 ͱ͢Δ 43
  40. Robust inference for the eight schools ν Λະ஌ύϥϝʔλͱͯ͠ѻ͏ • ࣄޙ෼෍΁ͷ

    ν ͷӨڹ͸େ͖͘ͳ͍ (ਤ 17.1ɼਤ 17.2) • ν ΁ͷґଘੑ͕΋ͬͱେ͖͚Ε͹ɼ(µ, τ, ν) ʹ͍ͭͯͷແ৘ ใࣄલ෼෍Λݕ౼͢Δ 44
  41. Robust inference for the eight schools ߟ࡯ • தԝ஋ɼ50%͓Αͼ 95%۠ؒͰ͸ਖ਼ن෼෍ϞσϧͰ΋

    insensitive ͩͬͨ • ͔͠͠ 99.9%۠ؒʹͳΔͱ෼෍ͷͦ͢ʹେ͖͘ґଘ͢Δͷ Ͱɼν ͷӨڹΛड͚΍͍͔͢΋ ˠࠓճͷྫͰ͸͋·Γؔ৺͸ͳ͍ 45
  42. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ ௨ৗͷઢܗճؼϞσϧͰ͸֎Ε஋ͷӨڹ͕େ͖͍ ˠޡ߲͕ࠩࣗ༝౓ͷখ͍͞

    t ෼෍ʹ͕ͨ͠͏ͱԾఆ Ԡ౴ม਺ yi ͕આ໌ม਺ Xi ʹΑͬͯ p(yi|Xi, β, σ2) = tν(yi|Xiβ, σ2) ͱ༩͑ΒΕΔ ͜Ε·Ͱͱಉ༷ɼtν Λਖ਼ن෼෍ͱई౓෇͖ٯΧΠೋ৐෼෍Ͱߟ ͑Δ    yi ∼ N(yi|Xiβ, Vi) Vi ∼ Inv-χ2(Vi|ν, σ2) 46
  43. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ • p(β,

    σ|ν, y) ͷࣄޙ࠷ස஋ΛΈ͚ͭΔ • p(β, logσ|ν, y) ∝ 1 ͷແ৘ใࣄલ෼෍ΛԾఆ ˠࣄޙ࠷ස஋͸χϡʔτϯ๏ͳͲͰ௚઀ٻ·Δ • ผͷํ๏ͱͯ͠ɼVi Λʮܽଌʯͱࢥ͏ͱ EM ΞϧΰϦζϜͷ ࿮૊ΈͰѻ͑Δ 47
  44. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ E εςοϓ

    ਖ਼ن෼෍Ϟσϧͷे෼౷ܭྔͷظ଴஋ΛٻΊΔ ࠓճ͸ ∑ n i=1 1 Vi ʹ͍ͭͯߟ͑Δ Vi|yi,βold,σold,ν ∼ Inv-χ2   Vi ν+1, ν(σold)2 + (yi − Xiβold)2 ν + 1    (17.6) E ( 1 Vi |yi,βold,σold,ν ) = ν + 1 ν(σold)2 + (yi − Xiβold)2 48
  45. Robust regression using t-distributed errors ิ଍: (17.6) ࣜͷ࣍ͷࣜͷಋग़ E (

    1 Vi yi, βold, σold, ν ) ͸ม਺ม׵ʹΑͬͯٻΊΒΕΔ: x ∼ Inv-χ2(x|ν, s2) ͷͱ͖ɼy = 1 x ͷ͕ͨ͠͏֬཰෼෍͸ y ∼ Gamma ( y ν 2 , νs2 2 ) Ͱ͋Γɼ͜ΕΛ༻͍Δͱ 1 Vi ͷظ଴஋͕ٻ·Δ 49
  46. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ M εςοϓ

    W ͸ର֯੒෼͕ Wi,i = 1 Vi ͷର֯ॏΈߦྻ      ˆ βnew = (XT WX)−1XT Wy (ˆ σnew)2 = 1 n (y − X ˆ βnew)T W(y − X ˆ βnew) 50
  47. Robust regression using t-distributed errors ൓෮ॏΈ෇͖ઢܗճؼͱ EM ΞϧΰϦζϜ ͜ͷ EM

    ͷ൓෮͸൓෮ॏΈ෇͖࠷খೋ৐๏ͱ౳Ձ 1. ύϥϝʔλਪఆྔͷॳظ஋͕༩͑ΒΕ 2. ͦΕͧΕͷઆ໌ม਺ɾԠ౴ม਺ʹ͍ͭͯɼ࢒͕ࠩେ͖͚Ε͹ ॏΈΛখ͘͢͞Δ ν Λະ஌ͱ͢ΔͳΒ ECME ΞϧΰϦζϜΛ࢖͏ ˠࣗ༝౓Λߋ৽͢ΔεςοϓΛ௥Ճ 51
  48. Robust regression using t-distributed errors ΪϒεαϯϓϥʔͱϝτϩϙϦε๏ ࣄޙ෼෍͔ΒͷαϯϓϦϯά͸Ϊϒεαϯϓϥʔ΍ϝτϩϙϦε ๏Ͱ࣮ݱՄೳ ύϥϝʔλಉ࣌෼෍ p(β,

    σ2, V1, . . . , Vn|ν, y) ͔ΒͷαϯϓϦϯά 1. p(β, σ2|V1, . . . , Vnν, y) 2. p(V1, . . . , Vn|β, σ2ν, y) Vi|β, σ, ν, y ∼ Inv-χ2(Vi|ν, σ2) 3. ν ͕ະ஌ͳΒϝτϩϙϦε๏Ͱ ν Λಈ͔͢ ν ͕খ͍͞ͱ͖͸࠷ස஋͕ϚϧνϞʔμϧʹͳΓ͕ͪͰͭΒ͍ ˠॳظ஋Λ͍Ζ͍Ζม͑ͯγϛϡϨʔγϣϯ͢Δͷ͕େࣄ 52
  49. References I [1] Andrew Gelman, John B. Carlin, Hal S.

    Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. Bayesian Data Analysis Third Edition. CRC Press, 2014. [2] ਢࢁರࢤ, ϕΠζਪ࿦ʹΑΔػցֶशೖ໳. ߨஊࣾ, 2017.