Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Offline A/B testing for Recommender Systems
Search
alpicola
November 20, 2018
Technology
2.2k
0
Share
Offline A/B testing for Recommender Systems
alpicola
November 20, 2018
More Decks by alpicola
See All by alpicola
[AEON TECH HUB #24] お客様の長期的興味の理解に向けて
alpicola
0
170
商品レコメンドでのexplicit negative feedbackの活用
alpicola
2
980
Recommending What Video to Watch Next: A Multitask Ranking System
alpicola
1
950
Kibanaを用いたアクセスログ調査と解析 / Access Log Analysis Using Kibana
alpicola
0
1k
Other Decks in Technology
See All in Technology
JTCでRedmine利用者2700人を実現した手法 第二部
nobuonakamura
0
150
JaSSTに関わることで変わった人生観 #jasstnano
makky_tyuyan
0
160
ワールドカフェ再び、そしてゴール・ルール・ロール・ツール / World Café Revisited, and the Goals-Rules-Roles-Tools
ks91
PRO
0
190
社内RAGの導入で気を付けたポイント
yakumo
1
130
Purview Endpoint DLP 動かしてみた
kozakigh
1
460
マンション備え付けのネットワークとLTE回線を組み合わせた ネットワークの安定化の考案
harutiro
1
140
20260516_SecJAWS_Days
takuyay0ne
2
540
"スキルファースト"で作る、AIの自走環境
subroh0508
1
650
業務に残された「良くない型」で考える「TypeScriptの難しさ」
sajikix
3
880
AIのために、AIを使った、Effect-TSからの脱却 〜テストを活用した安全なリファクタリングの進め方〜
bitkey
PRO
1
180
そのSLO 99.9%、本当に必要ですか? 〜優先度付きSLOによる責任共有の設計思想〜 / Is that 99.9% SLO really necessary? Design philosophy of shared responsibility through prioritized SLOs
vtryo
0
880
[みん強]AIの価値を最大化するデータ基盤戦略:Self-Service型Data Meshへの転換とAgentic AI Meshに向けた取り組み with Snowflake他
y_matsubara
1
160
Featured
See All Featured
The SEO Collaboration Effect
kristinabergwall1
1
450
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
530
[SF Ruby Conf 2025] Rails X
palkan
2
1k
GraphQLの誤解/rethinking-graphql
sonatard
75
12k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.2k
Crafting Experiences
bethany
1
150
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.3k
Music & Morning Musume
bryan
47
7.2k
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
180
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.3k
Faster Mobile Websites
deanohume
310
31k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
4k
Transcript
Offline A/B testing for Recommender Systems ͯͳ ాத (alpicola) @
จಡΈձ 11/19 1
Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷจ — SpotifyͷRecSys'18จͰݴٴ
2
Offline A/B testing for Recommender Systems — CriteoͷWSDM'18ͷจ — SpotifyͷRecSys'18จͰݴٴ
— ΫοΫύου։࠵ͷಡΈձͰ͢Ͱʹհ͞Ε͍ͯͨ — ͕ɺվΊͯ۷ΓԼ͕͛ͨͰ͖Εͱࢥ͍·͢ 3
ΦϑϥΠϯABςετ? — ΦϯϥΠϯͰߦ͏ABςετ࣌ؒͱ͕͔͔ۚΔ — ΦϑϥΠϯͰͦΕʹ͍ۙධՁ͕ߦ͑ΕΞϧΰϦζ ϜվળͷαΠΫϧΛߴԽͰ͖Δ — Ͱਫ਼? ! 4
ϩάʹجͮ͘ΦϑϥΠϯධՁͷݚڀ — Counterfactual estimationͱ͔off-policy estimationͱ ݺΕΔ — WSDM'15ͷνϡʔτϦΞϧ — SIGIR'16ͷνϡʔτϦΞϧ
— ධՁ͚ͩͰͳֶ͘शͷతؔʹ͏͜ͱͰ͖Δ — ͜ͷจͰධՁͷΈΛѻ͏ 5
จͷߩݙ — ΦϑϥΠϯABςετͰ༻͍Δใुͷਪఆख๏NCISͷ ͋Δछͷ࠷దੑΛࣔ͢ — ͜ͷݟʹج͍ͮͯNCISͷ֦ுPieceNCISͱ PointNCISΛఏҊ — ΦϯϥΠϯABςετ݁Ռͱͷ૬͕ؔେ্͖͘ 6
ઃఆ — Top-k ϥϯΩϯά — : ϩά — : ίϯςΩετ
— : ΞΫγϣϯ — : ใु 7
ઃఆ — : ίϯςΩετ͔ΒΞΫγϣϯΛબͿϙϦγʔ — : ݱߦͷϙϦγʔ — : ςετ͍ͨ͠ϙϦγʔ
— : ฏۉॲஔޮՌ — ͜ΕΛਪఆ͍ͨ͠ 8
ઃఆ — ΦϯϥΠϯABςετ — ͷݩͰͷϩάͱ ͷݩͰͷϩά͕͋Δ — ඪຊฏۉͰ , ͦΕͧΕਪఆ
— ΦϑϥΠϯABςετ — ͷݩͰͷϩά͔Β ਪఆ ! 9
ैདྷख๏ — Importance sampling (IS) — Normalized importance sampling (NIS)
— Doubly robust estimator (DR) — Capped importance sampling (CIS) — Normalized capped importance sampling (NCIS) ౷ܭϞϯςΧϧϩ๏ͷจ຺Ͱొ 10
Importance sampling (IS) — ! όΠΞε͕ͳ͍ — — " ʹΑΔߴόϦΞϯε
(unbounded) — όϦΞϯε͕େ͖͍ͱ ͱ ΛൺֱͰ͖ͳ͍ 11
Normalized importance sampling (NIS) Λͬͯ Λஔ͖͑ — ! ҰகਪఆྔʹͳΔ —
— " ґવͱͯ͠όϦΞϯεେ 12
Capped importance sampling (CIS) ॏΈͷ࠷େΛ ʹ (max capping) ॏΈ͕ Ҏ্ͷ߲ࣺͯΔ
(zero capping) 13
CISͷόΠΞε 14
CISͷόΠΞε — όΠΞε ͷ࣌ͷ Ͱbound͞ΕΔ — — ใु͕େ͖͍ͱ͜ΖΛऔΕΔΑ͏ʹվળ͍ͨ͠ ͕ͦ͏͢ΔͱόΠΞε͕େ͖͘ͳΔ !
15
CISͷόΠΞε Cappingͷઃఆʹ͍͍τϨʔυΦϑ͕ଘࡏ͠ͳ͍ ! 16
Normalized capped importance sampling (NCIS) NIS, CIS྆ํͷΞΠσΞΛ࣋ͪࠐΉ 17
NCISͱCISͷؔ 18
NCISͱCISͷؔ CIS͕͍࣋ͬͯͨόΠΞε Λୈೋ߲ͰϞσϧ ͍ͯ͠ΔͱݟͳͤΔ 19
NCISͱCISͷؔ (ಛʹzero cappingͷ࣌) 20
NCISͱCISͷؔ (ಛʹzero cappingͷ࣌) — ͳΒۙతʹόΠΞ ε͕ͳ͘ͳΔ ! — ͷ ,
ʹର͢Δґଘ͕খ͍࣌͞ͳͲ 21
NCISͷόΠΞε 22
NCISͷόΠΞε — ͱcappingͷ༗ແʹ૬͕ؔ͋ΔͱόΠΞε͕େ͖͘ ͳΔ ! — ަབྷҼࢠϢʔβʔͷλΠϓͳͲ͕ߟ͑ΒΕΔ (Table 1) 23
NCISͷόΠΞε 24
จͷΞΠσΞ — ͷϞσϦϯάΛάϩʔόϧ㱺ϩʔΧϧʹ — ίϯςΩετ ʹରͯ͠ہॴతͳNCIS — ͱcappingͷ૬ؔΛݮΒ͢ — Piecewise
NCIS: ׂ͞ΕͨྖҬ͝ͱʹNCIS — Pointwise NCIS: ཁૉ͝ͱʹNCIS 25
Piecewise NCIS (PieceNCIS) ίϯςΩετͷू߹ ͷׂ Λߟ͑Δ 26
Piecewise NCIS (PieceNCIS) ׂ֤ʹରͯ͠NCIS 27
ׂͷྫ దͳؔ ΛఆΊͯ ֤ Ͱ ͷ ʹର͢Δґଘ͕খ͘͞ͳΔΑ͏ʹ 28
Pointwise NCIS (PointNCIS) ཁૉ୯ҐͰׂ͢Δ (i.e. ) ಛఆͷίϯςΩετʹର͢Δαϯϓϧ͘͝গͳ͍ͷ ͰૉʹNCISΛద༻Ͱ͖ͳ͍ 29
Pointwise NCIS (PointNCIS) — ΞΫγϣϯʹ͍ͭͯपลԽ͢Δ ͱਖ਼֬ʹٻΊΒΕΔ — ΞΫγϣϯͷ͕ଟ͍ͱܭࢉ͕ߴίετ ! —
ΛαϯϓϦϯάͰٻΊΔ 30
Midzuno-Sen method 1. Λαϯϓϧ 2. Λ ͔Β ͳͷ͕ಘΒΕΔ·Ͱαϯϓϧ 3. Λ
͔Βαϯϓϧ 4. Λฦ͢ ͜͏ͯ͠ಘΒΕΔΛ ͱॻ͘ 31
Pointwise NCIS (PointNCIS) — ͷ͏ͪ ͕ ͷσʔλແࢹͰ͖Δ — ใु͕εύʔεͳ࣌ʹޮతʹܭࢉͰ͖Δ !
32
࣮ݧ — ϓϩϓϥΠΤλϦͷσʔληοτ — 39छɺ߹ܭͰઍԯ݅ͷϩάσʔλ — ΫϦοΫϕʔεͷใु (εύʔε͔ͭࢄେ) — ରCIS,
NCIS, PieceNCIS, PointNCIS ( ) — IS, NISόϦΞϯε͕ߴ͗͢ΔͷͰআ֎ 33
ΦϯϥΠϯʗΦϑϥΠϯABςετͷ૬ؔ 34
ద߹ͱِӄੑ ʮ ͕ ΑΓΑ͍͔Ͳ͏͔ʯͷ2༧ଌͱͯ͠ݟΔ 35
࣮ݧ݁Ռͷ·ͱΊ — CIS૬͕ؔෛ — શମతʹΊͷਪఆ͕ग़͍ͯͨ (Figure 4) — CIS⇒NCISͰେ͖͘վળ —
NCIS⇒PointNCISͰِཅੑ͕͞ΒʹԼ͕Δ — ద߹NCISҎޙͦ͜·ͰΑ͘ͳΒͳ͍ — ࣮ߦʹ͓͍ͯਫ਼ʹ͓͍ͯPointNCIS͕Α͍ 36
Appendix — ͕খ͍͞ͱ ͕ cappingΛ͑Δ͜ͱ — Max cappingͰ ʹͳΔΑ͏ͳ ৽͍͠capping
͕ͱΕΔ (Lemma A.3) 37