Upgrade to Pro — share decks privately, control downloads, hide ads and more …

バンディット問題の理論とアルゴリズム 第8章 / bandit-8

todesking
December 27, 2019

バンディット問題の理論とアルゴリズム 第8章 / bandit-8

todesking

December 27, 2019
Tweet

More Decks by todesking

Other Decks in Technology

Transcript

  1. ࿈ଓ࿹όϯσΟοτͱ ϕΠζ࠷దԽ wબ୒ࢶ͕࣮਺ͷϕΫτϧͰ͋ΔΑ͏ͳ৔߹ͷόϯσΟοτ ໰୊ wऔΕΔߦಈͷू߹  w࣌ࠁ Ͱͷߦಈ  wߦಈ

    ʹର͢Δใुظ଴஋  w࠷దͳߦಈ  w ͷܗঢ়͸ະ஌ɺ ͷબ୒ࢶ͸࣮࣭తʹແݶͱ͍͏աࠅ ͳঢ়گ ⊂ ℝd t at ∈ a f(a) a* = argmaxa∈ f(a) f(a) a
  2. ࡶԻ wࡶԻͳ͠Ϟσϧ  wࡶԻ͋ΓϞσϧ  w͜ͷষͰ͸ ͷࡶԻ͕৐ΔϞσϧΛߟ͑Δ wࡶԻͳ͠Ϟσϧͷ৔߹ɺ ճͷ୳ࡧͰ࣮֬ʹ ͕Θ͔Δͨ

    Ίɺ ͷ৔߹Λߟ͑Δͷ͕ຊ࣭త wྫ.-ϞσϧͷϋΠύʔύϥϝʔλΛ ɺMFBWFPOFPVU$7 ͷ݁ՌΛ ͱ͢Δɻ wਅ໘໨ʹ-00$7͢ΔͳΒޡࠩͳ͠Ϟσϧ wҰ෦ͷαϯϓϧ͚ͩͰ$7͢ΔͳΒޡࠩ͋ΓϞσϧ Xt = f(at ) Xt = f(at ) + ϵt , E[ϵt ] = 0 ϵt ∼ (0,σ2) || a* || = ∞ a f(a)
  3. ࿈ଓ࿹όϯσΟοτ໰୊ʹ͓͚Δ ϦάϨοτ w  wใु࠷େͷબ୒ࢶΛબͼଓ͚ͨ৔߹ͱ࣮ࡍͷใुͷࠩ wྦྷੵใु࠷େԽΛ໨తͱ͢Δ w୯७ϦάϨοτ  w ࣌ࠁ

    ʹ͓͍ͯ΋ͬͱ΋ใुظ଴஋͕ଟ͍ߦಈ w࠷େͷใुͱ࠷ऴతʹબ͹Εͨߦಈͷใुͷࠩ wࡶԻͳ͠Ϟσϧͷ৔߹ɺ  w࠷ద࿹ࣝผΛ໨తͱ͢Δ regret(T) = T ∑ t=1 (f(a*) − f(at )) Δ(T) = f(a*) − f( ̂ a*(T)) ̂ a*(T) T Δ(T) = f(a*) − max i∈{1,…,T} f(at )
  4. Ψ΢εաఔ: ໨త w؍ଌ͞Εͨσʔλ͔Βɺະ஌ͷؔ਺ Λਪఆ͍ͨ͠ wೖྗ  wݶΒΕͨσʔλ͔Βͷਪఆ݁Ռ͸ෆਖ਼֬ wˠ৴པ౓Λ֬཰෼෍Ͱද͍ͨ͠ wؔ਺ͦͷ΋ͷͰ͸ͳ͘ɺͦͷ෼෍ ΛٻΊΔ

    f(x) : ℝd → ℝ X = {x1 , ⋯, xn }, y = {f(x1 ), ⋯, f(xn )} P[f |X, y] https://www.ism.ac.jp/~daichi/lectures/H26-GaussianProcess/gp-lecture2-daichi.pdf ؍ଌσʔλ(ेࣈ)Λݩʹ༧ଌ͞Εͨyͷ෼෍ɻ ظ଴஋͕࣮ઢɺ৴པ͕۠ؒ੨͍ྖҬͰࣔ͞Ε͍ͯΔ
  5. Ψ΢εաఔ: ʹ͍ͭͯͷԾఆ f w؍ଌσʔλ͔Β ͷ෼෍ΛٻΊΔʹ͸ԿΒ͔ͷԾఆ͕͍Δ wԾఆ ͸جఈؔ਺ ʹΑΔҰൠԽઢܕϞσϧ  Ͱ͋Γɺࣄલ෼෍

    Ͱ͋Δ w͋Δ ʹର͢Δ ͷ஋͸ɺ Λ࢖ͬͯ ͱॻ͚Δ w͜ͷͱ͖ɺ ͸ଟมྔΨ΢ε෼෍ʹै͍ɺ Ͱ͋Δ wΧʔωϧؔ਺ Λ༻͍Δͱ ͱॻ͚Δ f f ϕ(x) = (ϕ1 (x), …, ϕd (x)) f(x) = wTϕ(x) P(w) = (0, α−1I) X = (x1 , …, xN )T y = ( f(x1 ), …, f(xn ))T Φ = ϕ1 (x1 ) … ϕd (x1 ) ⋮ ⋱ ⋮ ϕ1 (xN ) … ϕd (xN ) y = Φw y y = (0, α−1ΦΦT) k(xn , xm ) = α−1ϕ(xn )Tϕ(xm ) y = (0, k(x1 , x1 ) ⋯ k(x1 , xN ) ⋮ ⋱ ⋮ k(xN , x1 ) ⋯ k(xN , xN ) )
  6. Ψ΢εաఔ: ༧ଌ w ͕؍ଌ͞Ε͍ͯΔঢ়ଶͰɺ ͷ෼෍Λٻ Ί͍ͨ wؔ਺ͷ෼෍ͱ͸Կ͔೚ҙͷ ʹରͯ͠ɺ ͷಉ࣌෼෍ ͕Θ͔Ε͹

    ͷ෼෍͕Θ ͔ͬͨ͜ͱʹͳΔ wଟมྔΨ΢ε෼෍ͷ৚݅෇͖෼෍ʹΑͬͯٻΊΒΕΔ w   ͱͯ͠ɺ Ͱ͋Δͱ͖ɺ ʹͳΔ wະ஌ͷ ͷ෼෍Λط஌ͷ Ͱදݱ͢Δ͜ͱ͕Ͱ͖ͨ X = {x1 , ⋯, xn }T, y = {f(x1 ), ⋯, f(xn )}T f X* = {x* 1 , …, x* M }T y* = {f(x* 1 ), ⋯, f(x* M )}T P[y*|X*, X, y] f K(n, m) = k(xn , xm ) k* (n, m) = k(xn , x* m ) k** (n, m) = k(x* n , x* m ) ( y y*) ∼ ( 0, ( K k* kT * k** )) P[y*|X*, X, y] = (kT * K−1y, k** − kT * K−1k*) y* X*, X, y
  7. ϕΠζ࠷దԽ: Χʔωϧ wόϯσΟοτຊʹ໭Δ wΧʔωϧͷબ୒ wਖ਼ఆ஋Χʔωϧؔ਺ ʹରͯ͠ ͱ͢Δͷ͕Ұൠ త w 

    wεέʔϧύϥϝʔλ ΛͱΔ wΨ΢εΧʔωϧ  wΨ΢εΧʔωϧͷҰൠԽͰ͋ΔϚλʔϯΧʔωϧ wઢܗΧʔωϧ Λ࢖༻ͨ͠৔߹ઢܗόϯσΟοτ ষ ͱಉ౳ g k(a, a′ ) = σ2 0 g(∥a − a′ ∥λ ) ∥a∥λ = d ∑ i=1 ai /λ2 i λ g(z) = exp(−z2/2) k(a, a′ ) = σ2 0 aTa′
  8. ࿈ଓ࿹όϯσΟοτͷํࡦ: GP-UCB w ͕Ψ΢εաఔʹै͏ͱͨ͠৔߹ɺࠓ·Ͱͷ݁Ռ͔Βߦಈ ͰಘΒΕΔ ใु ͷฏۉ ͱ෼ࢄ ͕ܭࢉՄೳ wϕΠζ৴པ۠ؒͷ্ݶ͸

    ͱͳΔ w ৴པ౓ w֤࣌ࠁͰ Λ࠷େԽ͢Δ ΛબͿํࡦ͕(16$# wϦάϨοτΛ࠷খԽͤ͞Δͷ͕໨త͕ͩɺ Λେ͖͘औΔͱ୯७Ϧά Ϩοτͷ࠷খԽʹ΋ରԠՄೳ w Ψ΢εΧʔωϧ w ࣍਺ ͷϚλʔϯΧʔωϧ f a f(a) μ(a|Xt ) σ(a|Xt ) ¯ μa (t) = μ(a|Xt ) + αt σ(a|Xt ) αt ̂ μa (t) a αt regret(T) = O ( T(log T)d+2 ) regret(T) = O (T ν + d(d + 1) 2ν + d(d + 1) log T) 1 < ν < ∞
  9. ࿈ଓ࿹όϯσΟοτͷํࡦ: τϯϓιϯநग़ wΨ΢εաఔΛલఏͱ͍ͯ͠ΔͨΊɺ ͷ෼෍͕ಘΒΕΔ wՄೳͳߦಈ Λ཭ࢄԽ͠ɺ༗ݶͷީิ ʹରͯ͠ ͷ஋ ΛαϯϓϦϯά͠ɺ࠷େͱͳΔߦಈΛબͿ wΨ΢εաఔΛ࢖ͬͨํࡦʹڞ௨͢Δಛ௃ͱͯ͠ɺ

    ࣍ݩߦྻ ʹؔ͢Δܭࢉ͕ඞཁࢼߦ͕ଟ͘ͳΔͱܭࢉྔ͕૿͢ w ͷܭࢉྔφΠʔϒʹ΍ͬͯ  wۙࣅ͢Ε͹ݮΒͤ͸͢Δ w3BUFTPG$POWFSHFODFGPS4QBSTF7BSJBUJPOBM (BVTTJBO1SPDFTT3FHSFTTJPO*$.-#FTUQBQFS wઢܗόϯσΟοτʹ͓͍ͯ͸͜ͷ໰୊͕ͳ͍͔ΘΓʹɺແݶ ࣍ݩΛѻ͑ͳ͍ f(a) a′ s f(a ∈ a′ s ) t K−1 O(N3)
  10. ࿈ଓ࿹όϯσΟοτͷํࡦ: ظ଴վળྔํࡦ wΨ΢εաఔʹ͓͍ͯ୯७ϦάϨοτͷ࠷খԽΛ໨ࢦ͢ํࡦ w ճͷߦಈͷதͰҰ൪ྑ͔ͬͨใु  w ճͷߦಈޙͷ୯७ϦάϨοτ  w࣌ࠁ5ʹ͓͚Δ࠷దߦಈ

     wظ଴վળྔ  wߦಈ ʹΑͬͯɺࠓ·Ͱ؍ଌ͞Εͨ࠷େ஋͔ΒͲΕ͚ͩվળ͞ΕΔ͔ ͷظ଴஋ w ͕࠷େʹͳΔબ୒ࢶΛબͿ T ̂ f* T = max{f(aT ), ̂ f* T−1 } T Δ(T) = f(a*) − ̂ f* T ̂ aT = argmaxa∈ E[max{f(a), ̂ f* T−1 }| f(aT−1 )] = argmaxa∈ E[max{f(a) − ̂ f* T−1 ,0}| f(aT−1 )] EI(a| f(at )) = E[max{f(a) − ̂ f* t ,0}| f(at )] a EI(a| f(at ))
  11. ڞ෼ࢄؔ਺ͷύϥϝʔλਪఆ w࣌ࠁ ʹ͓͚Δߦಈ Λܾఆ͢Δࡍɺ Λ ༻͍Δํࡦ wಛʹ ͕খ͍͞͏ͪ͸ɺਅ஋͔Β͔͚཭ΕͨྖҬʹਪఆ஋͕ऩଋ ͢Δ৔߹͕͋Δ w&*ํࡦͷ৔߹ɺ

    Ͱ΋࠷ѱ࣌Ͱ͸୯७ϦάϨοτ͕ʹऩଋ ͠ͳ͍ wແݶʹࢼߦͯ͠΋ਖ਼͍͠౴͕͑ಘΒΕͳ͍ʜʜ wڞ෼ࢄύϥϝʔλʹؔ͢ΔࣄޙฏۉΛͱΔํࡦ wԿΒ͔ͷείΞؔ਺ Λ࠷େԽ͢Δ Λબͼ͍ͨ৔߹ wࣄલ෼෍ ௨ৗ͸Ұ༷෼෍ Λஔ͖ɺ Λ࠷େ Խ͢Δ ΛબͿ t + 1 at+1 ̂ θt = argmaxθ∈Θ L(θ; Xt ) t t → ∞ uθ (a) a π(θ) Eθ∼π(θ|Xt ) [uθ (a; Xt )] a