Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
solving of multi-armed bandit problem in advert...
Search
yoppi
April 20, 2018
2
7.3k
solving of multi-armed bandit problem in advertisement recommendation
yoppi
April 20, 2018
Tweet
Share
More Decks by yoppi
See All by yoppi
Applying oCPC algorithm for production
yoppi
2
710
recommendation system with document similarity
yoppi
0
3.1k
RailsはRubyだ
yoppi
0
230
Featured
See All Featured
The Cult of Friendly URLs
andyhume
78
6k
VelocityConf: Rendering Performance Case Studies
addyosmani
325
24k
Happy Clients
brianwarren
98
6.7k
The World Runs on Bad Software
bkeepers
PRO
65
11k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
126
18k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
28
8.2k
A Tale of Four Properties
chriscoyier
156
23k
jQuery: Nuts, Bolts and Bling
dougneiner
61
7.5k
Navigating Team Friction
lara
183
14k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
28
2k
Designing for Performance
lara
604
68k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
27
4.3k
Transcript
ࠂϨίϝϯυʹ͓͚Δ ଟόϯσΟοτͷద༻ͱղ๏ )JSPLB[V:PTIJEB!ZPQQJCMPH 4QFFF *OD ΦϨγΧφΠτ
import “github.com/yoppi” w !ZPQQJCMPH4QFFF *OD w 6;06UFDIMFBEFOHJOFFS w ϨίϝϯυΤϯδϯ࡞ͬͨΓ w
Ξυαʔό࡞ͬͨΓ w ࠂۀք͘Β͍ʢಈըࠂɺωΠςΟϒΞυʣ w (Pɺ+BWB4DSJQUɺ3VCZ
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
6;06ͷࠂϨίϝϯυ ࠂೖߘ ࠂ৴ ΞυωοτϫʔΫ ʜ ʜ ϝσΟΞ ࠂཧళ
6;06ͷࠂϨίϝϯυ هࣄɾίϯςϯπ ʮ͋ͳͨʹ͓͢͢Ίͷهࣄʯ ϝσΟΞͷهࣄԼ෦ʹ ΟδΣοτͱ͍͏ܗͰ ࠂΛ৴͢Δ
6;06ͷࠂϨίϝϯυ
6;06ͷࠂϨίϝϯυ ࠂ ࠂ ࠂ/ ͋ΔࠂͰԿΛ৴͢ΕΫϦοΫͯ͘͠ΕΔͷ͔ʁ ʜ
ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε৴ ‣ ͦͷϝσΟΞͷهࣄͱࠂͷྨࣅ͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά৴ ‣ ࠂओͷΛݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷΛϨί ϝϯυ͢Δ
w λʔήςΟϯά৴ ‣ ϢʔβͷଐੑΛ%.1͔Βਪଌ͠ྨࣅ͢ΔଐੑͷࠂΛϨ ίϝϯυ͢Δ w աڈͷՌʹجͮ͘৴ ‣ ͦͷʹաڈʹ৴͞ΕͨใΛͱʹՌ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
ࠓճͷϨίϝϯυΤϯδϯͷత w 6;06ͷച্ʢF$1.ʣΛ্͍͛ͨ w ܭࢉྔΛݮΒ͠ޮΑ͘ϨίϝϯυγεςϜΛӡ༻ ͍ͨ͠
ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε৴ w ͦͷϝσΟΞͷهࣄͱࠂͷྨࣅ͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά৴ w ࠂओͷΛݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷΛϨί ϝϯυ͢Δ
w λʔήςΟϯά৴ w ϢʔβͷଐੑΛ%.1͔Βਪଌ͠ྨࣅ͢ΔଐੑͷࠂΛϨ ίϝϯυ͢Δ w աڈͷՌʹجͮ͘৴ w ͦͷʹաڈʹ৴͞ΕͨใΛͱʹՌ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
ଟόϯσΟοτ ͕Kݸ ࣌ؒtʹ͓͍ͯi(t)Λબ͠ ใुXi(t)ΛಘΔ ͦΕΛ܁Γฦͯ͠ใुΛ࠷େ Խ͍ͤͨ͞ i(t) Xi(t)
ଟόϯσΟοτ w બࢶͷू߹͔ΒҰͭΛબͨ͠ͱ͖ʹɺͦͷબࢶʹ ର͢ΔใुΛಘΔ͕ଞͷબࢶͷใुใಘΒΕͳ͍ w ͦͷঢ়گԼͰ܁Γฦ͠બͨ͠ͱ͖ʹใुͷΛ࠷େԽ ͢Δ͜ͱΛࢦ͢ ‣ iͷਅͷظµi Λ͍ͬͯΔͳΒظ͕࠷େʹͳ
ΔΛҾ͖ଓ͚Ε͍͍͕ͦΕલఏ݅ͱͯ͠Ͱ͖ ͳ͍ w ͦΕ·ͰʹಘͨใुΛͱʹબࢶΛܾఆ͢ΔઓུΛํ ࡦʢQPMJDZʣͱݺͿ
ଟόϯσΟοτ ʲྫʳ εϩοτϚγϯΛճ(T= 100) Ҿ͚Δͱ͖ʹ ใुΛ࠷େԽͤ͞Δʹʁ K = 4
ଟόϯσΟοτ K = 4 ʲࡶͳํࡦʳ nճશ෦ͷΛҾ͍ͯՌͷྑ͔ͬͨΛɺ Γͷճ(100-4n) Ҿ͖ଓ͚Δ
ଟόϯσΟοτ K = 4 n = 3 ͜ͷͱ͖ճՌ͕Α͔ͬͨΛҾ͖ଓ͚Δ͕ɺ ͔͔ͨͩճͷ୳ࡧͰਅͷՌΛݕূ͢Δ͜ͱ ͦ͠͏
i* = argmaxi∈{1, 2, 3, 4}µi
ଟόϯσΟοτ K = 4 n = 20 ͜ͷͱ͖ճՌ͕Α͔ͬͨΛҾ͖ଓ͚Δ͕ɺ ใुͷྑ͍Λར༻͢Δճ͕গͳ͘ɺ ใुΛ࠷େԽͤ͞Δ͜ͱ͍͔͠
i* = argmaxi∈{1, 2, 3, 4}µi
୳ࡧͱࣝར༻ͷτϨʔυΦϑ w ୳ࡧʢFYQMPSBUJPOʣ w ଞͷͷ΄͏͕Ռ͕ྑ͍ͷͰͱ༧ଌͯ͠Ҿ͍ͯΈΔ w ͨͩɺʮͬͱଞʹྑ͍͕͋ΔͷͰʁʯΛ܁Γฦ͢ͱɺՌ ͷ͍͍ΛҾ͘ճ͕গͳ͘ͳΓύϑΥʔϚϯεΛͩ͢͜ͱ͕Ͱ͖ ͳ͍ w
ࣝར༻ʢFYQMPJUBUJPOʣ w ͋Δظؒͷ͍͔ͭ͘ͷͷՌ͕࠷ྑ͍ͷΛ͞Βʹબ͠ଓ͚ Δ w ͨͩɺʮ͜ͷ࠷ߴʂʯͱͳͬͯͦΕ͔Γબ͍ͯ͠Δͱɺଞʹ ͬͱύϑΥʔϚϯε͕ྑ͍Λݟ͚ͭΒΕͳ͍ w ୳ࡧͱࣝར༻ΛͦΕͧΕదʹ࣮ߦ͢ΔํࡦΛݕ౼͢Δ͜ͱ͕ॏཁ
ଟόϯσΟοτͷओཁͳํࡦ w ЏHSFFEZ w ୯७ͳ૯Γ݁Ռ͔Βྑ͔ͬͨͷΛҾ͖ଓ͚Δ w 6$#ʢ6QQFSDPOpEFODFCPVOEʣ w ඪຊฏۉʹิਖ਼߲ΛՃ͑ͨͷΛείΞͱͯ͠ΛબͿ w
54ʢ5IPNQTPO4BNQMJOHʣ w աڈʹ؍ଌͰ͖ͨ݁Ռ͔ΒϕΠζ౷ܭͰະདྷͷظΛ༧ ଌ͢Δ
ଟόϯσΟοτͷओཁͳํࡦ w ЏHSFFEZ w 6$# w 5IPNQTPO4BNQMJOH
5IPNQTPO4BNQMJOH w ʮ֤͕ظ࠷େͱͳΔ֬ʯΛͱʹϥϯμϜͰΛબ͢Δํ ࡦʢ֬Ұக๏ʣ w ͜ͷ֬ͷΈΛϕΠζ౷ܭͰఆࣜԽͨ͠ͷ͕5IPNQTPO 4BNQMJOH w ظµi ͕ԿΒ͔ͷࣄલ͔ΒΓཱ͍ͬͯΔͱ͠ɺਅͷظµi
ͷࣄޙΛٻΊΛબ͢Δ w ྫ͑ࠂΛྫʹڍ͛Δͱɺࠂ͕ΫϦοΫ͞ΕͨɺΫϦοΫ͞ Εͳ͔ͬͨͱ͍͏ʹྨͰ͖ΔͷͰࣄલϕϧψʔΠ ʢBer(µ): µ∈[0, 1]ʣͱͳΔ w ܭࢉΛ؆ૉʹ͢ΔͨΊɺϕϧψʔΠͷڞࣄલͰ͋Δ#FUB Λ༻͍Δ
5IPNQTPO4BNQMJOHͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤iʹ͍ͭͯni ←0, mi ←0 2:
for i = 1, 2,...,T do 3: ^μi = Beta(mi+α, (ni-mi)+β) 4: i ← argmaxi∈{1,2,...,K}^μi 5: iΛͻ͍ͯใुXi(t)∈{0,1}Λ؍ଌ͢Δ 6: ni ← ni + 1, mi ← mi + Xi(t) 7: end for
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
6;06ͷࠂϨίϝϯυͷద༻ w ଟόϯσΟοτΛࠂϨίϝϯυΞϧΰϦζϜͱͯ͠ 6;06ద༻ w ͕ࠂʹରԠ͢Δ w ํࡦͱͯ͠5IPNQTPO4BNQMJOHΛ࠾༻ w ࣄલͱࣄޙͦͷࠂͷ$53ͷ
w ຖͷ৴݁Ռ $53 Λࣄલͱͯ͠ࣄޙΛܭࢉ͠ɺ ৴͢ΔࠂΛબ͢Δ w #FUBͷܭࢉ࣌ʹ͓͚ΔЋ ЌώϡʔϦεςΟοΫʹܾΊ ଧͪ
5IPNQTPO4BNQMJOHΛબΜͩ༁ w όονߋ৽ʹؤ݈Ͱ͋Δ w ຊ൪ӡ༻͢ΔࡍʹσʔλαΠζ͔Β͘Δܭࢉྔͷ੍ʹΑΓ ใु݁ՌΛϦΞϧλΠϜͰ؍ଌ͢Δ͜ͱ͕͘͠Ԇͯ͠ߋ৽ ͢Δ͜ͱ͕ଟ͍ w ͕ɺͷબ࣌ʹʮظ࠷େͱͳΔ֬ʹ͕ͨͬͯ͠ϥϯμ ϜͰબ͢ΔʯͨΊɺՌͷѱ͍ʹภΔ͜ͱ͕গͳ͍
w ܭࢉྔ͕গͳ͍ w جຊతͳϩδοΫࣄલ͔ΒࣄޙΛ#FUBΛܭࢉ͢ Δ͚ͩͰΑ͍ͷͰܭࢉྔΛ͑ΒΕΔ w ଟ͘ͷϞσϧͰཧݶքΛୡͰ͖Δ͜ͱ͕ΒΕ͍ͯΔ
54Λࠂʹద༻ͨ͠߹ͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤ࠂiʹ͍ͭͯ impressioni ←0, clicki ←0
2: for i = 1, 2,...,T do 3: ^μi = Beta(clicki+α, (impressioni-clicki)+β) 4: i←argmaxi∈{1,2,…,K}^μi*CPCi*1000 5: ࠂάϧʔϓiΛͻ͍ͯใुXi(t)∈{0, 1} Λ؍ଌ͢Δ 6: impressioni ←impressioni+1, clicki ←clicki+Xi(t) 7: end for
Ћ ЌͬͯͲ͏ܾΊΕ͍͍ͷʁ w ࣄલʢ৽نʹೖߘ͞Εͨࠂʣ͕ͳ͍ࠂʹ͓͍ ͯҙຯΛͳ͢ w ؾΛ͚ͭΔ͖ͱ͜ΖฏۉͱࢄͰΦʔμʔ͕มΘ Δ
ϨίϝϯυγεςϜߏ ࠂϚελ ৴ϩά ৴αʔόɾ࣮ੵूܭαʔό ৴σʔλ 0O.FN $BDIF ৴σʔλ࡞όον 0O.FN $BDIF
.FEJB "EWFSUJTFS
(PWT1ZUIPO w ݁ہͷͱ͜Ζ5IPNQTPO4BNQMJOHࣄલ͔Βࣄ ޙΛٻΊΔͱ͖ʹ#FUBΛܭࢉ͢Δ͚ͩͰΑ͍ w ߴͳֶతͳܭࢉٻΊΒΕͳ͍ w ѹతʹ(PͰܭࢉ͢Δ΄͏͕ճͷόονʹΑΔ࣮ߦ ࣌ؒΛ͘Ͱ͖Δ w
όϯσΟοτΛղ͘ͱ͖ʹͷߋ৽ִؒΛͳ Δ͘খ͘͞ͳΔ΄͏͕ྑ͍ w ͦͷͨΊ(PΛ࠾༻࣮ͯ͠
(PWT1ZUIPO ࣮ߦ࣌ؒ T /
(P 1ZUIPO
ద༻݁Ռ w 6;06Λಋೖ͍ͯ͠ΔϝσΟΞʹద༻ w ৴͍ͯ͠Δɺ͔ͭɺJNQSFTTJPO͕ଟ͍ϝσΟΞ w 6;06શମͷF$1.Λ΄Ͳ্Ͱ͖ͨ
·ͱΊͱࠓޙͷ՝ w తͱ͍ͯͨ͠ɺܭࢉྔͷݮٴͼF$1.ͷ্ΛୡͰ͖ͨ w ϋΠύʔύϥϝʔλͷ࠷దԽ w Ћ ЌͷΛࠓώϡʔϦεςΟοΫʹఆ͍ٛͯ͠Δঢ়ଶͳͷͰ ͝ͱͷ࠷దͳΛٻΊ͍ͨ w
બ͢Δͷߋ৽සͷ࠷దԽ w શͯͷʹ͓͍ͯ࣌ؒݻఆͰΛߋ৽͍ͯ͠ΔͷͰ͝ͱͷ࠷ దͳߋ৽සΛٻΊ͍ͨ w $73ΛՃຯͨ͠ͷ༧ଌ w $53ͷΈ༧ଌͨ͠ͷͰΛબΜͰ͍ΔͷͰ$73༧ଌ౿·͑ Λબ͍ͨ͠
ࢀߟจݙ w ྛ ΞυωοτϫʔΫʹ͓͚Δࠂ৴ܭըͷ ࠷దԽ ਓೳֶձࢽ7PM/P w ଟόϯσΟοτͷཧͱΞϧΰϦζϜ IUUQJCJTNMPSHBSDIJWFJCJT JCJT@CBOEJUQEGT
w ຊଟ३ʰόϯσΟοτͷཧͱΞϧΰϦζ Ϝʱߨஊࣾ