Upgrade to Pro — share decks privately, control downloads, hide ads and more …

迅速な学習機構を用いて逐次適応性を損なうことなく非線形性を扱う文脈付き多腕バンディット手法/e...

Avatar for monochromegane monochromegane
September 15, 2022

 迅速な学習機構を用いて逐次適応性を損なうことなく非線形性を扱う文脈付き多腕バンディット手法/extreme_neural_linear_bandits

Avatar for monochromegane

monochromegane

September 15, 2022
Tweet

More Decks by monochromegane

Other Decks in Research

Transcript

  1. ࡾ୐ ༔հ1,2ɼ็ ߃ݑ3 1. Pepabo R&D Institute, GMO Pepabo, Inc.,

    2. ۝भେֶ େֶӃγεςϜ৘ใՊֶ෎ ৘ใ஌ೳ޻ֶઐ߈ 3. ۝भେֶ େֶӃγεςϜ৘ใՊֶݚڀӃ ৘ใ஌ೳ޻ֶ෦໳ 2022.09.15 SMASH22 Summer Symposium ਝ଎ͳֶशػߏΛ༻͍ͯ ஞ࣍దԠੑΛଛͳ͏͜ͱͳ͘ඇઢܗੑΛѻ͏ จ຺෇͖ଟ࿹όϯσΟοτख๏
  2. • దԠతͳγεςϜͷ࣮ݱʹ͸ɺγεςϜ͕ར༻ऀͷঢ়گΛΑ͘஌Δ͜ͱ͕ॏཁ • ECαΠτͷγεςϜͰ͋Ε͹ɺར༻ऀͷᅂ޷Λ೺Ѳ͢Δ͜ͱͰɺ࠷దͳ঎ ඼΍ಋઢΛఏҊͰ͖Δ • ࣮ӡ༻ͷγεςϜʹ͓͍ͯίϛϡχέʔγϣϯʹ͸ίετ͕͔͔Δ • ʢར༻ऀࣗ਎΋ؚΊͯʣཁٻ΍ᅂ޷͸໌֬Ͱ͸ͳ͘ঃʑʹܗ੒͞Ε͍ͯ͘ •

    ͦͷظؒதͷෛ୲΍ػձଛࣦ͸୹ظ௕ظͰചΓ্͛ͳͲʹӨڹ͢Δ • ಛʹɺཁٻ΍ᅂ޷͕มԽ͢Δ؀ڥͰ͸ɺݱ࣌఺ͰՁ஋ͷ௿͍ίϛϡχέʔ γϣϯ΋ܧଓͯ͠ߦ͏ඞཁ͕͋Δ 4 దԠతͳγεςϜͱίϛϡχέʔγϣϯίετ
  3. • ࿹͝ͱͷใु෼෍͸ৗʹಉ͡Ͱ͋Δͱ͍͏Ծఆ • → ঢ়گ΍ଐੑ͝ͱʹใु෼෍͕ҟͳΔͷͰ͸ͳ͍͔ʁ • ྫʣ೥୅͝ͱʹਓؾͷ঎඼͕ҧ͏ɺ࠷ۙɺಉ͡ΧςΰϦͷ঎඼Λങͬͨ 7 ଟ࿹όϯσΟοτ໰୊ͷ֦ு •

    ʮจ຺෇͖ʯଟ࿹όϯσΟοτ໰୊ͱ֦ͯ͠ு͞Ε͍ͯΔ • → ͜ͷղ๏Ͱ͸ɺίϯςΩετ৘ใˎͱใुͷؔ܎ੑΛਪଌ͢Δ • ˎίϯςΩετ৘ใͱ͸ɺঢ়گ౳Λ৘ใγεςϜͰѻ͑Δܗʹม׵ͨ͠΋ͷ
  4. • ίϯςΩετ৘ใͱใुͷؒʹઢܗͳؔ܎ΛԾఆͯ͠ਪଌ • LinUCB [L. Li 2010]ɺLinear Thompson Sampling [S.

    Agrawal 2013] 8 ैདྷͷจ຺෇͖ଟ࿹όϯσΟοτղ๏ a(k*) = argmaxk=1,K (x⊤ ˜ θ(k) + α x⊤U(k)x) ˜ θ(k) = U(k)v(k) U(k) = ( N(k) ∑ i=1 xi x⊤ i ) −1 v(k) = N(k) ∑ i=1 xi yi ྫ-JO6$#ʹ͓͚Δ࿹ͷબఆ ਪఆͨ͠ฏۉใुͱɺࢼߦճ਺ʹԠͨ͡ෆ࣮֬ੑͷදݱͰ͋Δ୳ࡧ߲ͷ࿨ͷ࠷΋େ͖ͳ࿹Λબఆ͢Δ ใु͕ೖྗͷίϯςΩετ৘ใ ͱύϥϝʔλ ͷ ੒෼ͷੵ࿨͔Βٻ·ΔͱԾఆ͢ΔઢܗϞσϧ x θ
  5. • Neural Network (NN) Λ༻͍ͯɺίϯςΩετ৘ใͱใुͷඇઢܗͳؔ܎ੑΛ ѻ͏ख๏͕ొ৔ [R. Allesiardo 2014,C. Riquelme

    2018, M. Collier 2018, D. Guo 2020, D. Zhou 2020, S. Sajeev 2021] 10 จ຺෇͖ଟ࿹όϯσΟοτ໰୊ͷߴ౓Խͱඇઢܗͳղ๏ • NNϞσϧ͕ͦͷੑೳΛൃش͢ΔͨΊʹ͸ɼେྔͷֶशσʔλͱͦΕʹదԠ͞ ͤΔͨΊͷॆ෼ͳֶश͕࣌ؒඞཁ • ར༻ऀ͔Βஞ࣍తʹग़͞ΕΔଟ༷͔ͭมԽ͢Δཁٻ΁ͷదԠੑʢஞ࣍దԠੑʣ ͷ௿ԼΛট͘ • ֶश࣌ؒͷ૿ՃΛߟྀ͠ͳ͍৔߹ɺҙࢥܾఆج४ͷߋ৽͕஗Ԇ͢Δ • ஞ࣍తͳֶशΛආ͚Δ৔߹ɺ࠷৽ͷ৘ใΛར༻Ͱ͖ͳ͍
  6. • దԠతͳγεςϜͷ࣮ݱͷͨΊɺෳࡶͳҙࢥܾఆΛਝ଎ʹߦ͏ػߏ͕ඞཁ 
 • ͜ΕΛఆࣜԽͨ͠จ຺෇͖ଟ࿹όϯσΟοτ໰୊ʹର͢Δඇઢܗͳղ๏ʹண໨ 
 • ैདྷղ๏Ͱͷஞ࣍దԠੑΛଛͳ͏ֶश࣌ؒͷ૿Ճͷ՝୊Λղܾ͍ͨ͠ 
 •

    ൓෮తͳֶश͕ෆཁͰֶश͕࣌ؒ୹͍NNϞσϧͱͷ౷߹ΛఏҊ • Ճ͑ͯɺଟ࿹όϯσΟοτղ๏ʹର͢ΔಉϞσϧͷ༗༻ੑΛ෼ੳɾߟ࡯ 11 ݚڀͷ໨తͱఏҊͷࠎࢠ
  7. • NNϞσϧΛಋೖͨ͠ॳظͷඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτղ๏ • ೚ҙͷNNϞσϧΛίϯςΩετ৘ใ͔ΒใुΛਪఆ͢Δใुؔ਺ͱͯ͠ར༻ • -GreedyʹΑΔݻఆൺ཰Ͱͷ࿹ͷ׆༻ͱ୳ࡧ ϵ 13 Neural Bandit1

    [R. Allesiardo 2014] ʜ ʜ ʜ ʜ ʜ ʜ ʜ ʜ ʜ ʜ xt ̂ y(1),* t ̂ y(2) t ̂ y(K) t argmaxk=1,K ̂ y(k),1 − ϵ ∀a ∈ A, ϵ/K Neural Network Bandit
  8. • NNϞσϧΛಋೖͨ͠ॳظͷඇઢܗͳจ຺෇͖ଟ࿹όϯσΟοτղ๏ • ೚ҙͷNNϞσϧΛίϯςΩετ৘ใ͔ΒใुΛਪఆ͢Δใुؔ਺ͱͯ͠ར༻ • -GreedyʹΑΔݻఆൺ཰Ͱͷ࿹ͷ׆༻ͱ୳ࡧ ϵ 14 Neural Bandit1

    [R. Allesiardo 2014] • NNϞσϧΛଟ࿹όϯσΟοτղ๏ʹಋೖ͢Δࡍͷ2ͭͷ՝୊Λ໌Β͔ʹͨ͠ 1. ׆༻ͱ୳ࡧʹ͓͚ΔNNϞσϧͷෆ࣮֬ੑͷߟྀ 2. ஞ࣍దԠੑͷ֬อ
  9. • ࿹ͷ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑΛNNϞσϧʹ͓͍ͯͲͷΑ͏ʹදݱ͢ Δ͔ͱ͍͏՝୊ • LinUCBͷ୳ࡧ߲ʹ૬౰͢Δ஋ΛNNϞσϧͰͲ͏දݱ͢Δ͔ • • ैདྷͷNNϞσϧΛಋೖ͢Δղ๏ͷଟ͕͘͜ͷ՝୊ͷํʹண໨͍ͯ͠Δ • ༧ଌ࣌ʹDropout๏Λద༻ͯ͠ੜ͡Δਪఆͷ෯Λෆ࣮֬ੑͱΈͳ͢ղ๏

    [C. Riquelme 2018][M. Collier 2018] • Bootstrap๏ʹΑΔෳ਺ͷχϡʔϥϧωοτϫʔΫͷϞσϧΛ֬཰తʹબ୒ͯ͠ੜ͡Δਪఆ 
 ͷ෯Λෆ࣮֬ੑͱΈͳ͢ղ๏ [C. Riquelme 2018][D. Guo 2020] • ใुͷਪఆͱͷ͔ࠩΒٻ·Δύϥϝʔλʹର͢Δޯ഑Λ༻͍ͯෆ࣮֬ੑΛදݱ͢Δղ๏ [D. Zhou 2020] a(k*) = argmaxk=1,K (x⊤ ˜ θ(k) + α x⊤U(k)x) 15 1. ׆༻ͱ୳ࡧʹ͓͚ΔϞσϧͷෆ࣮֬ੑͷߟྀ
  10. • Neural Linear [C. Riquelme 2018] ͸2ͭͷ՝୊ΛNNϞσϧͱଟ࿹όϯσΟοτղ๏ͷ ϞσϧΛ෼཭͢Δ͜ͱͰରԠͨ͠ • ೚ҙͷNNϞσϧΛݩͷίϯςΩετ৘ใ͔Βใुͱͷؔ܎ੑΛΑ͘දݱ͢Δ৽͍͠

    ίϯςΩετ৘ใ΁ͷม׵ثͱͯ͠ར༻ • ࿹ͷ׆༻ͱ୳ࡧ͸ɺैདྷͷઢܗͳղ๏Λར༻ʢ1ͭ໨ͷ՝୊ʹରԠʣ • NNϞσϧͷֶशִؒͱଟ࿹όϯσΟοτղ๏ͷஞ࣍తͳߋ৽ͷִؒͱ੾Γ཭͢ 
 ʢ2ͭ໨ͷ՝୊Λ؇࿨ → ࠷৽ͷ৘ใ͸ར༻Ͱ͖ͳ͍ʣ 17 2. ஞ࣍దԠੑͷ֬อ ʜ ʜ ʜ xt ˜ x(1) t a(k*) = argmaxk=1,K (˜ x(k)⊤ ˜ θ(k) + α ˜ x(k)⊤U(k) ˜ x(k) ) ʜ ʜ ʜ ʜ ˜ x(K) t Bandit (LinUCB, Linear Thompson Sampling, etc…)
  11. • NNϞσϧͷֶशͰར༻͞ΕΔޡࠩٯ఻೻๏͸ɺ༧ଌޡ͔ࠩΒٻ·Δޯ഑Λ༻ ͍ͯ൓෮తʹֶशΛਐΊΔ • → ֶशσʔλͷ૿Ճʹ൐͍ɺ݁Ռͷऩଋ·Ͱʹඞཁֶ͕श͕࣌ؒ૿Ճ͢Δ 
 
 • ऩଋ·Ͱͷ࣌ؒΛ୹ॖ͢Δख๏ͱͯ͠ɺ֬཰తޯ഑߱Լ๏΍ޯ഑߱Լ๏ͷ࠷ద

    ԽΞϧΰϦζϜ [D.P. Kingma 2014]͕ఏҊ͞Ε͍ͯΔ • → ґવͱͯ͠ɺֶशσʔλͷ૿Ճʹ൐ֶ͍श͕࣌ؒ૿Ճ͢Δಛੑ͸࢒Δ • ௚ۙʹಘΒΕֶͨशσʔλͷΈΛ༻͍ͯ௥ՃతʹֶशΛߦ͏ • → ഁ໓త๨٫ [J. Kirkpatrick 2017]͕ൃੜ͠ɺਫ਼౓͕Լ͕Δ໰୊΋ใࠂ͞Ε͍ͯΔ 18 2. ஞ࣍దԠੑͷ֬อʢNNϞσϧͰͷैདྷͷରԠʣ
  12. 23 ELMͷߏ଄   ʜ E   ʜ -

       W ∈ ℝL×d b ∈ ℝL β ∈ ℝ1×L h1 = ϕ( d ∑ i=1 W1,i xi + b1 ) x ∈ ℝd h(x) = ϕ . (Wx + b) ̂ y = βh(x) ͸೚ҙͷ׆ੑԽؔ਺ ϕ ͸ཁૉ͝ͱʹ Λద༻͢Δԋࢉ ϕ . ϕ • ݸͷϢχοτΛ࣋ͭ୯ҰͷӅΕ૚͔ΒͳΔNNϞσϧ 
 ʢ͜͜Ͱ͸ଟ࿹όϯσΟοτղ๏ͱͷ౷߹Λલఏʹग़ྗΛεΧϥʔʹݶఆʣ L
  13. 24 ELMͷֶशʢೖྗ૚ʙӅΕ૚ʣ • ॏΈ ͱόΠΞε ͸ཚ਺ͰॳظԽͯ͠ਪఆͷର৅ͱ͠ͳ͍ 
 ͜ͷ৔߹ɺ͜ͷϞσϧ͸ೖྗΛඇઢܗԽ͢Δಛ௃ྔؔ਺ Λ࣋ͬͨઢܗϞσϧͱΈͳͤΔ W

    b h(x)   ʜ E   ʜ -    W = (wi,j )1≤i≤L 1≤j≤d , wi,j ∼ P(θ) b = (bi )1≤i≤L, bi ∼ P(θ) β ∈ ℝ1×L h1 = ϕ( d ∑ i=1 W1,i xi + b1 ) x ∈ ℝd h(x) = ϕ . (Wx + b) ̂ y = βh(x) ͸೚ҙͷ׆ੑԽؔ਺ ϕ ͸ཁૉ͝ͱʹ Λద༻͢Δԋࢉ ϕ . ϕ
  14. 25 ELMͷֶशʢӅΕ૚ʙग़ྗ૚ʣ • ͜ͷઢܗϞσϧʹର͠࠷খೋ৐๏Λద༻͠ɺॏΈ Λਪఆ͢Δ 
 ͜Ε͸ֶशσʔλ ͱ ʹ͓͚Δɺ༧ଌޡࠩ ͷ࠷খղ

    ͱͯ͠ٻ·Δ β X y ∥Hβ⊤ − y∥2 ̂ β⊤   ʜ E   ʜ -    x ∈ ℝd ̂ y = ̂ βh(x) H = ϕ . (XW⊤ + B) ∈ ℝN×L X = (x1 , …, xN )⊤ ∈ ℝN×d y = (y1 , …, yN )⊤ ∈ ℝN B = (b, …, b) ∈ ℝN×L ̂ β⊤ = (H⊤H)−1H⊤y /PUBUJPO W = (wi,j )1≤i≤L 1≤j≤d , wi,j ∼ P(θ) b = (bi )1≤i≤L, bi ∼ P(θ)
  15. 27 OS-ELMͷֶश • ELMͱಉ༷ͷઢܗϞσϧʹର͠ɺஞ࣍࠷খೋ৐๏Λద༻͠ɺॏΈ Λਪఆ͢Δ 
 ࣌఺·Ͱͷܭࢉ݁Ռ͸ɺಛ௃ྔؔ਺ͱग़ྗ͔ΒٻΊͨ஋ͷ࿨ͱͯ͠ࢀরͰ͖Δ β N 

     ʜ E   ʜ -    x ∈ ℝd ̂ y = ̂ βN+1 h(x) ̂ β⊤ N+1 = (H⊤ N+1 HN+1 )−1H⊤ N+1 yN+1 W = (wi,j )1≤i≤L 1≤j≤d , wi,j ∼ P(θ) b = (bi )1≤i≤L, bi ∼ P(θ) QN+1 = ( N+1 ∑ i=1 h(xi )h(xi )⊤ ) −1 = (Q−1 N + h(xN+1 )h(xN+1 )⊤) −1 = QN − QN h(xN+1 )h(xN+1 )⊤QN 1 + h(xN+1 )⊤QN h(xN+1 ) rN+1 = N+1 ∑ i=1 yi h(xi ) ٯߦྻͷิॿఆཧ ͳ͓ɺ ͷظؒ͸ٯߦྻ ΛٻΊΔ͜ͱ͕Ͱ͖ͳ͍ͨΊɺ ஞ࣍తͳֶशΛ࣮ߦ͠ͳ͍ʢ#PPTUJOHظؒʣ L > N Q
  16. 29 ఏҊख๏: Extreme Neural Linear Bandits xt ˜ x(1) t

    a(k*) = argmaxk=1,K (˜ x(k)⊤ ˜ θ(k) + α ˜ x(k)⊤U(k) ˜ x(k) ) ʜ ˜ x(K) t ʜ ʜ ʜ ʜ OS-ELM + ਖ਼ଇԽ ม׵ؔ਺ β⊤ ⊗ h(x) Extreme Neural Linear Bandits Neural Network Bandit (LinUCB, Linear Thompson Sampling, etc…) • Neural Linear [C. Riquelme 2018] ํࣜΛ࠾༻ʢNNͱBanditͷ੾ସ΍վળΛࢹ໺ʹʣ • ಉํࣜͰͷOS-ELMద༻ʹ͋ͨΓɺʮ1. ม׵ؔ਺ʯʮ2. ਖ਼ଇԽ߲ʯΛಋೖ
  17. • ݩͷίϯςΩετ৘ใ ͔Β৽͍͠ίϯςΩετ৘ใ ΛಘΔͨΊɺӅΕ૚ ͷग़ྗ ͱग़ྗ૚ͷؒͷॏΈ ͱͷཁૉ͝ͱͷੵΛ༻͍Δ • ैདྷͷNeural LinearͰ͸࠷ऴӅΕ૚ͷग़ྗΛͦͷ··༻͍Δ

    • → OS-ELMʹ͓͍ͯɺඇઢܗੑΛଊ͑ΔͨΊʹ࣮࣭తʹد༩͍ͯ͠ΔॏΈ·Ͱ൓өͤ͞Δ ͜ͱͰɺίϯςΩετ৘ใͱͯ͠ͷ༗༻ੑΛ޲্ͤ͞Δ x ˜ x h(xt ) β 30 ఏҊख๏: ίϯςΩετ৘ใͷม׵ؔ਺ ʜ ʜ ʜ xt ˜ x(NeuralLinear) t = h(xt ) ˜ x(ExtremeNeuralLinearBandits) t = β⊤ ⊗ h(xt ) ʜ ʜ ग़ྗ૚ͷॏΈ·Ͱ׆༻
  18. • OS-ELMͰ͸ɺࢼߦճ਺ ͕Ϣχοτ਺ ະຬͷ࣌ɺBoostingظؒͱֶͯ͠ श͕Ͱ͖ͳ͍ͨΊɺ͜ͷظؒͷػձଛࣦ͕ൃੜ͢Δ • ఏҊख๏Ͱ͸ɺOS-ELMʹϦοδճؼΛద༻͠ɺࢼߦॳظ͔Βஞ࣍తʹֶशՄೳ • ݩͷ༧ଌޡࠩʹਖ਼ଇԽ߲ΛՃ͑ͨ ͷ࠷খղ

    ͱͯ͠ٻ·Δ 
 
 
 
 • ύϥϝʔλͷϊϧϜʹ੍໿ΛՃ͑Δ͜ͱ͔ΒɺաֶशΛ๷͗൚Խੑೳͷ޲্΋ظ଴ Ͱ͖Δ N L ∥Hβ⊤ − y∥2 + λnn ∥β⊤∥2 ̂ β⊤ 31 ఏҊख๏: OS-ELM΁ͷਖ਼ଇԽͷಋೖ ̂ β⊤ N+1 = (H⊤ N+1 HN+1 + λnn I)−1H⊤ N+1 yN+1 ͷ࣌ʹ ͔Β࢝·ΔΑ͏ʹมߋ N = 0 λnn I
  19. • Wheel bandits [C. Riquelme 2018]: ඇઢܗͳଟ࿹όϯσΟοτ໰୊γϛϡϨʔγϣϯ • ࣌఺ίϯςΩετ৘ใ ʹରͯ͠બఆͨ͠࿹͔Β

    ใु ͕ ͷΑ͏ʹಘΒΕΔ • ฏۉใुֹ ͸ҎԼͷΑ͏ʹ࿹͝ͱʹܾఆ͞ΕΔʢͨͩ͠ ʣ t xt = (xi )1≤xi ≤2, xi ∼ Uniform(−1,1) yt yt ∼ 𝒩 (μ, σ2) μ μ2 < μ1 ≪ μ3 33 ධՁํ๏
  20. • Wheel banditsΛ༻͍ͨඇઢܗͳଟ࿹όϯσΟοτ໰୊ͷγϛϡϨʔγϣϯ • γϛϡϨʔγϣϯͷύϥϝʔλ: • γϛϡϨʔγϣϯ͝ͱʹ5000ճͷࢼߦɻ50ճͷฏۉ஋Λ݁Ռʹ༻͍Δ • ൺֱ͢Δղ๏͸ҎԼͷ௨Γɻղ๏ؒͷࠩҟ͕໌֬ʹͳΔΑ͏ઃఆΛἧ͑Δ •

    ֶशִؒʹ͍ͭͯɺ࣌ؒͷ͔͔ΔNeural Linear (Full)ʹ߹Θͤͯ100ճ͝ͱͱͨ͠ μ1 = 1.2,μ2 = 1.0,μ3 = 5.0,σ2 = 0.1,δ = 0.7 34 ධՁํ๏ NN Bandit Ϟσϧ ӅΕ૚ ਖ਼ଇԽ Ϟσϧ ਖ਼ଇԽ ୳ࡧ཰ LinUCB: ઢܗͳղ๏ - - - LinUCB λ=1.0 α=0.1 Neural Linear (Differential): ඇઢܗɺࠩ෼ֶश MLP (Diff) L=100 λ=1.0 LinUCB λ=1.0 α=0.1 Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0 LinUCB λ=1.0 α=0.1 Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 LinUCB λ=1.0 α=0.1
  21. • γϛϡϨʔγϣϯͷྦྷੵใुΛղ๏͝ͱʹൺֱ • ઢܗͳղ๏ʢLinUCBʣྦྷੵใु͕௿͘ɺඇઢܗͳઃఆΛॆ෼ʹѻ͑ͳ͍ • Neural Linear (Full) ͕ྦྷੵใु͕࠷΋ߴ͍ •

    Differential < Extreme Neural Linear BanditsͰ͋Δ͜ͱ͔Βɺ 
 ࣮ߦ࣌ؒͷ୹ॖΛ໨తͱͨࠩ͠෼ํࣜͱͯ͠ɺ 
 ఏҊख๏͕ଟ࿹όϯσΟοτ໰୊ʹର͢Δ 
 ੑೳΛҡ࣋Ͱ͖͍ͯΔ͜ͱ͕Θ͔Δ 35 ෳࡶͳҙࢥܾఆʹର͢ΔੑೳͷධՁ
  22. • ࿹ͷධՁͷߋ৽ʹؔ͢Δྦྷੵ࣮ߦ࣌ؒΛղ๏͝ͱʹൺֱ • NNͷֶश͕ෆཁͳLinUCB͕0.05ඵͱ࠷΋଎͍ʢඇઢܗ΁ͷରԠ͸ෆे෼ʣ • ͍࣍ͰɺఏҊख๏3.0ඵɺNeural Linear (Differential) 13.1ඵɺFull͕28.3ඵ •

    ཧ༝1: ఏҊख๏ͱDifferential͸ࠩ෼ֶशͷͨΊɺࢼߦճ਺ͷ૿ՃʹґΒͣ ֶश͕࣌ؒҰఆ • ཧ༝2: ఏҊख๏͸൓෮తͳֶश͕ෆཁɻ 
 ֶश࣌ؒ͋ͨΓͷ࣮ߦ࣌ؒ͸ 
 Differentialͷ0.3ඵʹର͠0.07ඵͱ4.1ഒఔ౓ߴ଎ 36 ஞ࣍దԠੑͷධՁ
  23. • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸ςετʹର͢Δ༧ଌޡࠩͷઈର஋ͷฏۉΛશ࿹ͷϞσϧͰ߹ܭͨ͠΋ͷ • ࣮ઢ͸఺ઢ͸ਖ਼ଇԽύϥϝʔλ ɺ఺ઢ͸ ͷ݁Ռ λnn =

    1.0 λnn = 0.0001 38 1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ NN Ϟσϧ ӅΕ૚ ਖ਼ଇԽ LinUCB: ઢܗͳղ๏ - - - Neural Linear (Differential): ඇઢܗɺࠩ෼ MLP (Diff) L=100 λ=1.0 Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0 Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 MLP(Diff)ʢࠩ෼σʔλʹΑΔ൓෮తͳֶशํࣜʣͰ͸ɺ NNϞσϧͱͯ͠΋༧ଌޡࠩ͸ݮΒͳ͔ͬͨ OS-ELMͱMLP(Full)ʢશσʔλʹΑΔ൓෮తͳֶशํ ࣜʣͰ͸ɺಛʹॳظʹֶ͓͍ͯशσʔλͷ૿Ճʹ൐͍༧ ଌޡ͕ࠩݮগͨ͠
  24. • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸ςετʹର͢Δ༧ଌޡࠩͷઈର஋ͷฏۉΛશ࿹ͷϞσϧͰ߹ܭͨ͠΋ͷ • ࣮ઢ͸఺ઢ͸ਖ਼ଇԽύϥϝʔλ ɺ఺ઢ͸ ͷ݁Ռ λnn =

    1.0 λnn = 0.0001 39 1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ NN Ϟσϧ ӅΕ૚ ਖ਼ଇԽ LinUCB: ઢܗͳղ๏ - - - Neural Linear (Differential): ඇઢܗɺࠩ෼ MLP (Diff) L=100 λ=1.0 Neural Linear (Full): ඇઢܗɺ౎౓શֶश MLP (Full) L=100 λ=1.0 Extreme Neural Linear Bandits (ఏҊ) OS-ELM L=100 λ=1.0 MLP(Diff)ʢࠩ෼σʔλʹΑΔ൓෮తͳֶशํࣜʣͰ͸ɺ NNϞσϧͱͯ͠΋༧ଌޡࠩ͸ݮΒͳ͔ͬͨ OS-ELMͱMLP(Full)ʢશσʔλʹΑΔ൓෮తͳֶशํ ࣜʣͰ͸ɺಛʹॳظʹֶ͓͍ͯशσʔλͷ૿Ճʹ൐͍༧ ଌޡ͕ࠩݮগͨ͠ ਖ਼ଇԽΛऑΊΔͱMLP(Full)ͷΈɺ༧ଌޡ͕ࠩ͞Βʹݮগ → όϯσΟοτ໰୊ͱͯ͠ͷྦྷੵใुͰ͸ਖ਼ଇԽ͕ڧ͍ ํ͕݁Ռ͕Α͔ͬͨͷͱରরత
  25. • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷਫ਼౓Λൺֱ • ࿹ ʹඥͮ͘NNϞσϧͷਪఆใु஋ͷ෼෍ʢ ͷपลΛ֦େʣΛՄࢹԽ a2 μ = μ3

    40 1. NNϞσϧͱͯ͠ͷਫ਼౓ͷ෼ੳ D = 100 D = 25k D = 100 D = 25k λnn = 0.0001 λnn = 1.0 04&-. .-1 'VMM 5SVUI ᶃ ֶशσʔλ͕૿͑Δͱਅͷใु෼෍ͷܗঢ়ʹۙͮ͘ ᶄ ਖ਼ଇԽ͕ऑ͍ͱಘΒΕͨσʔλ ʹద߹͠΍͍͢ʢաֶशʣ ᶅ աֶशΛڐ༰্ͨ͠ͰOS- ELM͸MLP(Full)ͱൺ΂ͯද ݱྗͷݶքΛ֬ೝͰ͖Δ
  26. 42 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ //Ϟσϧͷ༧ଌޡࠩͱ͸

    ͓͓Αͦٯؔ਺ͷؔ܎ MLP(Full)ͷ ʹ͍ͭͯɺ༧ଌޡࠩ͸ৗʹ ΑΓେ͖͔ͬͨʹ΋ؔΘΒͣɺ Ҏ߱ ͷ߹ܭใु͸ٯస͢Δ λnn = 1.0 λnn = 0.0001 D = 5k ઌͷධՁͰ΋ɺ ΑΓ΋ ͷํ͕ྦྷੵ ใु͕ଟ͍ λnn = 0.0001 λnn = 1.0 ਖ਼ଇԽΛڧΊΔ͜ͱ͕ଟ࿹όϯσΟοτͷใुͷվળʹ ͭͳ͕Δ
  27. 43 2. NNϞσϧͱଟ࿹όϯσΟοτղ๏ͷؔ܎ੑ • ઌͷධՁର৅ͷղ๏ͷNNϞσϧʹର͠ɺֶशσʔλ਺͝ͱͷྦྷੵใुΛൺֱ • ࢦఆͷֶशσʔλ਺Ͱࣄલֶशͨ͠NNϞσϧΛ༻͍Δʢ͢ͳΘͪɺγϛϡϨʔγϣϯத͸όϯσΟοτϞσϧͷΈֶशʣ • ԣ࣠͸࿹͝ͱͷϞσϧͷֶशσʔλ਺ɺॎ࣠͸10ճͷγϛϡϨʔγϣϯͷ߹ܭใुֹͷฏۉ //Ϟσϧͷ༧ଌޡࠩͱ͸

    ͓͓Αͦٯؔ਺ͷؔ܎ NNϞσϧ΋ஞ࣍తʹֶश͢ΔલఏͷఏҊख๏ʹ͓͍ͯ͸ ਖ਼ଇԽ߲ͷಋೖ͕༗ޮͰ͋Δ͜ͱ͕ࣔࠦ͞Εͨ ਖ਼ଇԽΛڧΊΔ͜ͱ͕ଟ࿹όϯσΟοτͷใुͷվળʹ ͭͳ͕Δ ਖ਼ଇԽΛڧΊΔ͜ͱͰɺNNϞσϧͷ൚Խੑೳ͕ߴ·Γɺ ίϯςΩετ৘ใͷදݱ͕Ұఆͷ܏޲Ͱ؇΍͔ʹมԽ → ଟ࿹όϯσΟοτղ๏ͷֶश͕҆ఆ͢Δͱߟ͑ΒΕΔ