Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
solving of multi-armed bandit problem in advert...
Search
yoppi
April 20, 2018
2
7.8k
solving of multi-armed bandit problem in advertisement recommendation
yoppi
April 20, 2018
Tweet
Share
More Decks by yoppi
See All by yoppi
Applying oCPC algorithm for production
yoppi
2
790
recommendation system with document similarity
yoppi
0
3.2k
RailsはRubyだ
yoppi
0
250
Featured
See All Featured
Designing Experiences People Love
moore
142
24k
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
Automating Front-end Workflow
addyosmani
1370
200k
Facilitating Awesome Meetings
lara
55
6.5k
The Pragmatic Product Professional
lauravandoore
36
6.8k
Raft: Consensus for Rubyists
vanstee
140
7.1k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
830
Docker and Python
trallard
45
3.5k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.8k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Transcript
ࠂϨίϝϯυʹ͓͚Δ ଟόϯσΟοτͷద༻ͱղ๏ )JSPLB[V:PTIJEB!ZPQQJCMPH 4QFFF *OD ΦϨγΧφΠτ
import “github.com/yoppi” w !ZPQQJCMPH4QFFF *OD w 6;06UFDIMFBEFOHJOFFS w ϨίϝϯυΤϯδϯ࡞ͬͨΓ w
Ξυαʔό࡞ͬͨΓ w ࠂۀք͘Β͍ʢಈըࠂɺωΠςΟϒΞυʣ w (Pɺ+BWB4DSJQUɺ3VCZ
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
6;06ͷࠂϨίϝϯυ ࠂೖߘ ࠂ৴ ΞυωοτϫʔΫ ʜ ʜ ϝσΟΞ ࠂཧళ
6;06ͷࠂϨίϝϯυ هࣄɾίϯςϯπ ʮ͋ͳͨʹ͓͢͢Ίͷهࣄʯ ϝσΟΞͷهࣄԼ෦ʹ ΟδΣοτͱ͍͏ܗͰ ࠂΛ৴͢Δ
6;06ͷࠂϨίϝϯυ
6;06ͷࠂϨίϝϯυ ࠂ ࠂ ࠂ/ ͋ΔࠂͰԿΛ৴͢ΕΫϦοΫͯ͘͠ΕΔͷ͔ʁ ʜ
ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε৴ ‣ ͦͷϝσΟΞͷهࣄͱࠂͷྨࣅ͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά৴ ‣ ࠂओͷΛݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷΛϨί ϝϯυ͢Δ
w λʔήςΟϯά৴ ‣ ϢʔβͷଐੑΛ%.1͔Βਪଌ͠ྨࣅ͢ΔଐੑͷࠂΛϨ ίϝϯυ͢Δ w աڈͷՌʹجͮ͘৴ ‣ ͦͷʹաڈʹ৴͞ΕͨใΛͱʹՌ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
ࠓճͷϨίϝϯυΤϯδϯͷత w 6;06ͷച্ʢF$1.ʣΛ্͍͛ͨ w ܭࢉྔΛݮΒ͠ޮΑ͘ϨίϝϯυγεςϜΛӡ༻ ͍ͨ͠
ࠂϨίϝϯυͷछྨ w ίϯςϯπϕʔε৴ w ͦͷϝσΟΞͷهࣄͱࠂͷྨࣅ͔ΒϨίϝϯυ͢Δ w ϦλʔήςΟϯά৴ w ࠂओͷΛݟͨ͋ͱʹϢʔβʹରͯͦ͠ͷΛϨί ϝϯυ͢Δ
w λʔήςΟϯά৴ w ϢʔβͷଐੑΛ%.1͔Βਪଌ͠ྨࣅ͢ΔଐੑͷࠂΛϨ ίϝϯυ͢Δ w աڈͷՌʹجͮ͘৴ w ͦͷʹաڈʹ৴͞ΕͨใΛͱʹՌ͕Α͘ͳΔΑ ͏ʹϨίϝϯυ͢Δ
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
ଟόϯσΟοτ ͕Kݸ ࣌ؒtʹ͓͍ͯi(t)Λબ͠ ใुXi(t)ΛಘΔ ͦΕΛ܁Γฦͯ͠ใुΛ࠷େ Խ͍ͤͨ͞ i(t) Xi(t)
ଟόϯσΟοτ w બࢶͷू߹͔ΒҰͭΛબͨ͠ͱ͖ʹɺͦͷબࢶʹ ର͢ΔใुΛಘΔ͕ଞͷબࢶͷใुใಘΒΕͳ͍ w ͦͷঢ়گԼͰ܁Γฦ͠બͨ͠ͱ͖ʹใुͷΛ࠷େԽ ͢Δ͜ͱΛࢦ͢ ‣ iͷਅͷظµi Λ͍ͬͯΔͳΒظ͕࠷େʹͳ
ΔΛҾ͖ଓ͚Ε͍͍͕ͦΕલఏ݅ͱͯ͠Ͱ͖ ͳ͍ w ͦΕ·ͰʹಘͨใुΛͱʹબࢶΛܾఆ͢ΔઓུΛํ ࡦʢQPMJDZʣͱݺͿ
ଟόϯσΟοτ ʲྫʳ εϩοτϚγϯΛճ(T= 100) Ҿ͚Δͱ͖ʹ ใुΛ࠷େԽͤ͞Δʹʁ K = 4
ଟόϯσΟοτ K = 4 ʲࡶͳํࡦʳ nճશ෦ͷΛҾ͍ͯՌͷྑ͔ͬͨΛɺ Γͷճ(100-4n) Ҿ͖ଓ͚Δ
ଟόϯσΟοτ K = 4 n = 3 ͜ͷͱ͖ճՌ͕Α͔ͬͨΛҾ͖ଓ͚Δ͕ɺ ͔͔ͨͩճͷ୳ࡧͰਅͷՌΛݕূ͢Δ͜ͱ ͦ͠͏
i* = argmaxi∈{1, 2, 3, 4}µi
ଟόϯσΟοτ K = 4 n = 20 ͜ͷͱ͖ճՌ͕Α͔ͬͨΛҾ͖ଓ͚Δ͕ɺ ใुͷྑ͍Λར༻͢Δճ͕গͳ͘ɺ ใुΛ࠷େԽͤ͞Δ͜ͱ͍͔͠
i* = argmaxi∈{1, 2, 3, 4}µi
୳ࡧͱࣝར༻ͷτϨʔυΦϑ w ୳ࡧʢFYQMPSBUJPOʣ w ଞͷͷ΄͏͕Ռ͕ྑ͍ͷͰͱ༧ଌͯ͠Ҿ͍ͯΈΔ w ͨͩɺʮͬͱଞʹྑ͍͕͋ΔͷͰʁʯΛ܁Γฦ͢ͱɺՌ ͷ͍͍ΛҾ͘ճ͕গͳ͘ͳΓύϑΥʔϚϯεΛͩ͢͜ͱ͕Ͱ͖ ͳ͍ w
ࣝར༻ʢFYQMPJUBUJPOʣ w ͋Δظؒͷ͍͔ͭ͘ͷͷՌ͕࠷ྑ͍ͷΛ͞Βʹબ͠ଓ͚ Δ w ͨͩɺʮ͜ͷ࠷ߴʂʯͱͳͬͯͦΕ͔Γબ͍ͯ͠Δͱɺଞʹ ͬͱύϑΥʔϚϯε͕ྑ͍Λݟ͚ͭΒΕͳ͍ w ୳ࡧͱࣝར༻ΛͦΕͧΕదʹ࣮ߦ͢ΔํࡦΛݕ౼͢Δ͜ͱ͕ॏཁ
ଟόϯσΟοτͷओཁͳํࡦ w ЏHSFFEZ w ୯७ͳ૯Γ݁Ռ͔Βྑ͔ͬͨͷΛҾ͖ଓ͚Δ w 6$#ʢ6QQFSDPOpEFODFCPVOEʣ w ඪຊฏۉʹิਖ਼߲ΛՃ͑ͨͷΛείΞͱͯ͠ΛબͿ w
54ʢ5IPNQTPO4BNQMJOHʣ w աڈʹ؍ଌͰ͖ͨ݁Ռ͔ΒϕΠζ౷ܭͰະདྷͷظΛ༧ ଌ͢Δ
ଟόϯσΟοτͷओཁͳํࡦ w ЏHSFFEZ w 6$# w 5IPNQTPO4BNQMJOH
5IPNQTPO4BNQMJOH w ʮ֤͕ظ࠷େͱͳΔ֬ʯΛͱʹϥϯμϜͰΛબ͢Δํ ࡦʢ֬Ұக๏ʣ w ͜ͷ֬ͷΈΛϕΠζ౷ܭͰఆࣜԽͨ͠ͷ͕5IPNQTPO 4BNQMJOH w ظµi ͕ԿΒ͔ͷࣄલ͔ΒΓཱ͍ͬͯΔͱ͠ɺਅͷظµi
ͷࣄޙΛٻΊΛબ͢Δ w ྫ͑ࠂΛྫʹڍ͛Δͱɺࠂ͕ΫϦοΫ͞ΕͨɺΫϦοΫ͞ Εͳ͔ͬͨͱ͍͏ʹྨͰ͖ΔͷͰࣄલϕϧψʔΠ ʢBer(µ): µ∈[0, 1]ʣͱͳΔ w ܭࢉΛ؆ૉʹ͢ΔͨΊɺϕϧψʔΠͷڞࣄલͰ͋Δ#FUB Λ༻͍Δ
5IPNQTPO4BNQMJOHͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤iʹ͍ͭͯni ←0, mi ←0 2:
for i = 1, 2,...,T do 3: ^μi = Beta(mi+α, (ni-mi)+β) 4: i ← argmaxi∈{1,2,...,K}^μi 5: iΛͻ͍ͯใुXi(t)∈{0,1}Λ؍ଌ͢Δ 6: ni ← ni + 1, mi ← mi + Xi(t) 7: end for
ࠓ͢͜ͱ w 6;06ͷࠂϨίϝϯυ w ଟόϯσΟοτ w 6;06ͷࠂϨίϝϯυͷద༻ͱղ๏ w ద༻݁Ռ w
·ͱΊͱࠓޙͷ՝
6;06ͷࠂϨίϝϯυͷద༻ w ଟόϯσΟοτΛࠂϨίϝϯυΞϧΰϦζϜͱͯ͠ 6;06ద༻ w ͕ࠂʹରԠ͢Δ w ํࡦͱͯ͠5IPNQTPO4BNQMJOHΛ࠾༻ w ࣄલͱࣄޙͦͷࠂͷ$53ͷ
w ຖͷ৴݁Ռ $53 Λࣄલͱͯ͠ࣄޙΛܭࢉ͠ɺ ৴͢ΔࠂΛબ͢Δ w #FUBͷܭࢉ࣌ʹ͓͚ΔЋ ЌώϡʔϦεςΟοΫʹܾΊ ଧͪ
5IPNQTPO4BNQMJOHΛબΜͩ༁ w όονߋ৽ʹؤ݈Ͱ͋Δ w ຊ൪ӡ༻͢ΔࡍʹσʔλαΠζ͔Β͘Δܭࢉྔͷ੍ʹΑΓ ใु݁ՌΛϦΞϧλΠϜͰ؍ଌ͢Δ͜ͱ͕͘͠Ԇͯ͠ߋ৽ ͢Δ͜ͱ͕ଟ͍ w ͕ɺͷબ࣌ʹʮظ࠷େͱͳΔ֬ʹ͕ͨͬͯ͠ϥϯμ ϜͰબ͢ΔʯͨΊɺՌͷѱ͍ʹภΔ͜ͱ͕গͳ͍
w ܭࢉྔ͕গͳ͍ w جຊతͳϩδοΫࣄલ͔ΒࣄޙΛ#FUBΛܭࢉ͢ Δ͚ͩͰΑ͍ͷͰܭࢉྔΛ͑ΒΕΔ w ଟ͘ͷϞσϧͰཧݶքΛୡͰ͖Δ͜ͱ͕ΒΕ͍ͯΔ
54Λࠂʹద༻ͨ͠߹ͷٙࣅίʔυ ύϥϝʔλ: α>0, β>0 1: ֤ࠂiʹ͍ͭͯ impressioni ←0, clicki ←0
2: for i = 1, 2,...,T do 3: ^μi = Beta(clicki+α, (impressioni-clicki)+β) 4: i←argmaxi∈{1,2,…,K}^μi*CPCi*1000 5: ࠂάϧʔϓiΛͻ͍ͯใुXi(t)∈{0, 1} Λ؍ଌ͢Δ 6: impressioni ←impressioni+1, clicki ←clicki+Xi(t) 7: end for
Ћ ЌͬͯͲ͏ܾΊΕ͍͍ͷʁ w ࣄલʢ৽نʹೖߘ͞Εͨࠂʣ͕ͳ͍ࠂʹ͓͍ ͯҙຯΛͳ͢ w ؾΛ͚ͭΔ͖ͱ͜ΖฏۉͱࢄͰΦʔμʔ͕มΘ Δ
ϨίϝϯυγεςϜߏ ࠂϚελ ৴ϩά ৴αʔόɾ࣮ੵूܭαʔό ৴σʔλ 0O.FN $BDIF ৴σʔλ࡞όον 0O.FN $BDIF
.FEJB "EWFSUJTFS
(PWT1ZUIPO w ݁ہͷͱ͜Ζ5IPNQTPO4BNQMJOHࣄલ͔Βࣄ ޙΛٻΊΔͱ͖ʹ#FUBΛܭࢉ͢Δ͚ͩͰΑ͍ w ߴͳֶతͳܭࢉٻΊΒΕͳ͍ w ѹతʹ(PͰܭࢉ͢Δ΄͏͕ճͷόονʹΑΔ࣮ߦ ࣌ؒΛ͘Ͱ͖Δ w
όϯσΟοτΛղ͘ͱ͖ʹͷߋ৽ִؒΛͳ Δ͘খ͘͞ͳΔ΄͏͕ྑ͍ w ͦͷͨΊ(PΛ࠾༻࣮ͯ͠
(PWT1ZUIPO ࣮ߦ࣌ؒ T /
(P 1ZUIPO
ద༻݁Ռ w 6;06Λಋೖ͍ͯ͠ΔϝσΟΞʹద༻ w ৴͍ͯ͠Δɺ͔ͭɺJNQSFTTJPO͕ଟ͍ϝσΟΞ w 6;06શମͷF$1.Λ΄Ͳ্Ͱ͖ͨ
·ͱΊͱࠓޙͷ՝ w తͱ͍ͯͨ͠ɺܭࢉྔͷݮٴͼF$1.ͷ্ΛୡͰ͖ͨ w ϋΠύʔύϥϝʔλͷ࠷దԽ w Ћ ЌͷΛࠓώϡʔϦεςΟοΫʹఆ͍ٛͯ͠Δঢ়ଶͳͷͰ ͝ͱͷ࠷దͳΛٻΊ͍ͨ w
બ͢Δͷߋ৽සͷ࠷దԽ w શͯͷʹ͓͍ͯ࣌ؒݻఆͰΛߋ৽͍ͯ͠ΔͷͰ͝ͱͷ࠷ దͳߋ৽සΛٻΊ͍ͨ w $73ΛՃຯͨ͠ͷ༧ଌ w $53ͷΈ༧ଌͨ͠ͷͰΛબΜͰ͍ΔͷͰ$73༧ଌ౿·͑ Λબ͍ͨ͠
ࢀߟจݙ w ྛ ΞυωοτϫʔΫʹ͓͚Δࠂ৴ܭըͷ ࠷దԽ ਓೳֶձࢽ7PM/P w ଟόϯσΟοτͷཧͱΞϧΰϦζϜ IUUQJCJTNMPSHBSDIJWFJCJT JCJT@CBOEJUQEGT
w ຊଟ३ʰόϯσΟοτͷཧͱΞϧΰϦζ Ϝʱߨஊࣾ