Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Kaggle Google Quest Q&A Labeling - 23th place s...
Search
Shuhei Goda
February 28, 2020
Technology
4
4.1k
Kaggle Google Quest Q&A Labeling - 23th place solution
Shuhei Goda
February 28, 2020
Tweet
Share
More Decks by Shuhei Goda
See All by Shuhei Goda
Turing × atmaCup #18 - 1st Place Solution
hakubishin3
0
860
ジョブマッチングサービスにおける相互推薦システムの応用事例と課題
hakubishin3
3
930
とある事業会社にとっての Kaggler の魅力
hakubishin3
8
2.4k
課題の解像度が荒かったことで意図した改善ができなかった話
hakubishin3
3
1k
Wantedly におけるマッチング体験を最大化させるための推薦システム
hakubishin3
4
1.2k
Recommendation Industry Talks #1 Opening
hakubishin3
1
380
会社訪問アプリ「Wantedly Visit」での シゴトに関する興味選択機能と推薦改善
hakubishin3
0
630
論文紹介: Improving Implicit Feedback-Based Recommendation through Multi-Behavior Alignment(Xin Xin et al., 2023)
hakubishin3
0
620
Feedback Prize - English Language Learning における擬似ラベルの品質向上の取り組み
hakubishin3
0
990
Other Decks in Technology
See All in Technology
Computer Use〜OpenAIとAnthropicの比較と将来の展望〜
pharma_x_tech
6
970
Compose におけるパスワード自動入力とパスワード保存
tonionagauzzi
0
190
AIでめっちゃ便利になったけど、結局みんなで学ぶよねっていう話
kakehashi
PRO
1
530
Twelve-Factor-Appから学ぶECS設計プラクティス/ECS practice for Twelve-Factor-App
ozawa
3
160
クラウドネイティブ環境の脅威モデリング
kyohmizu
1
300
日経電子版 for Android の技術的課題と取り組み(令和最新版)/android-20250423
nikkei_engineer_recruiting
1
620
Новые мапы в Go. Вова Марунин, Clatch, МТС
lamodatech
0
1.7k
AIにおけるソフトウェアテスト_ver1.00
fumisuke
1
340
MySQL Indexes and Histograms – How they really speed up your queries
lefred
0
150
生成AIによるCloud Native基盤構築の可能性と実践的ガードレールの敷設について
nwiizo
7
1.4k
本当に必要なのは「QAという技術」だった!試行錯誤から生まれた、品質とデリバリーの両取りアプローチ / Turns Out, "QA as a Discipline" Was the Key!
ar_tama
4
770
もう難しくない!誰でもカンタンDocker入門 〜30分であなたのPCにアプリを立ち上げる〜
devops_vtj
0
180
Featured
See All Featured
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Product Roadmaps are Hard
iamctodd
PRO
53
11k
Fireside Chat
paigeccino
37
3.4k
Unsuck your backbone
ammeep
671
57k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
5
590
Building Flexible Design Systems
yeseniaperezcruz
329
39k
Become a Pro
speakerdeck
PRO
28
5.3k
Java REST API Framework Comparison - PWX 2021
mraible
31
8.5k
BBQ
matthewcrist
88
9.6k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.3k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
5
550
Stop Working from a Prison Cell
hatefulcrawdad
268
20k
Transcript
©2020 Wantedly, Inc. 23th place solution Kaggle Google Quest Q&A
Labeling লձ Feb 28, 2020 - Shuhei Goda - @jy_msc
©2020 Wantedly, Inc. Team - The Hand Shuhei Goda @jy_msc
Visit Engineering Team at Wantedly Naomichi Agata @agatan_ People Engineering Team at Wantedly
©2020 Wantedly, Inc. Model Pipeline #FSUCBTF VODBTFE -JHIU(#. #FSUCBTF VODBTFE
Settings ɾ3fold with GroupKFold ɾBCE + margin ranking loss ɾ3epoch Settings ɾmax_depth=1 ɾlr=0.1 Meta features ɾtext length ɾstackexchange Text data ɾquestion_title ɾquestion_body ɾanswer 1SF1SPDFTT 2BOE" 1SF1SPDFTT POMZ2 ɾquestion_title ɾquestion_body ɾquestion_title ɾquestion_body ɾanswer Settings ɾhtml escape ɾhead+tail truncation
©2020 Wantedly, Inc. ɾHTMLจࣈྻͷΞϯΤεέʔϓ Pre-Process IUUQTXXXLBHHMFDPNDHPPHMFRVFTUDIBMMFOHFEJTDVTTJPO
©2020 Wantedly, Inc. ɾςΩετσʔλͷ݁߹ͱτϦϛϯά ɹɾ[CLS] + question_title + [SEP] +
question_body + [SEP] + answer ɾquestion_body ͱ answer ͕ࢦఆͷ͞Λ͑ͨ߹, ͔྆ΒಉαΠζΛτϦϛϯά Pre-Process IUUQTBSYJWPSHBCT
©2020 Wantedly, Inc. ɾBert-base (uncased) ɹɾޙΖ4ͭͷӅΕͷग़ྗΛ༻ https://arxiv.org/abs/1905.05583 ɹɾQAؒͷSEP tokenͷग़ྗΛ༻ Model
Architecture
©2020 Wantedly, Inc. ɾLabel weight ɹɾ؆୯ͦ͏ͳλεΫweightΛখ͘͞, ෆۉߧͰͦ͠͏ͳλεΫweightΛେ͖͘ ɹɾgpyoptͰweightͷ୳ࡧΛࢼͨ͠Έ͕ͨ, Լهͷ୯७ͳΓํ͕࠷ྑ͔ͬͨ Loss
function Label weight ͋Γ Public: 0.45979, Private: 0.41440 Label weight ͳ͠ Public: 0.43455, Private: 0.40602
©2020 Wantedly, Inc. ɾBCE + margin ranking loss (1 :
1) ɹɾϛχόονΛ2ͭʹׂͯ͠ margin ranking loss Λܭࢉ Loss function BCE + margin ranking loss Public: 0.45979, Private: 0.41440 BCE Public: 0.44006, Private: 0.40668
©2020 Wantedly, Inc. ɾQuestion Model ɹɾQ༻ͷλεΫΛQuestion text͚ͩΛͬͯղ͘ ɹɾΠϯϓοτQ͚ͩͰ͍͍ͷͰ, Qͷtruncationͷྔ͕ݮΔ (Qͷใྔ͕૿͑Δ)
Training Q model + Q and A model Public: 0.45979, Private: 0.41440 Q and A model × 2 (seed average) Public: 0.44298, Private: 0.40613
©2020 Wantedly, Inc. ɾLightGBM ɹɾmax_depth=1, lr=0.1 ɹɾmeta features ɹɹɾtext length
(question, answer) ɹɹɾmeta data from stackexchange (Score, View, FavoriteCount, …) Post-Process LightGBM Public: 0.45979, Private: 0.41440 Simple binning without meta features Public: 0.45282, Private: 0.41387
©2020 Wantedly, Inc. Why we used LightGBM? 1. Simple binning
method ɹɾ༧ଌΛࢄԽ͢Δ͜ͱͰ Spearman’s correlation ͕ྑ͘ͳΔ͜ͱʹؾͮ͘ ɹɾtarget͝ͱʹϏϯαΠζΛࣄલʹઃఆͯ͠Ϗϯೋϯά ɹɾϏϯαΠζݻఆʹ্ͨ͠ͰBertͷ֤epochͷग़ྗΛweighted average (weight࠷దԽ)
©2020 Wantedly, Inc. Why we used LightGBM? 2. Optimize bin-size
and weights ɹɾϏϯαΠζ࠷దͳΛ͍ͨ͘ͳͬͨ ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ্͕ͨ͠ख͍͔͘ͳ͍ ɹɾ࠷దͳϏϯαΠζ༧ଌͷܗʹΑܾͬͯ·Δ. ֤foldͷ࠷దͳϏϯαΠζͷฏۉͱ weighted averageޙͷ༧ଌ࠷దͳͷ͔Βဃ͢Δ
©2020 Wantedly, Inc. Why we used LightGBM? 3. LightGBM ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ͍ͨ͠
ɹɾmeta features͍͍ͨ ɹɾGBDTσʔλΛׂׂͯ͠ޙͷྖҬʹ࠷దͳΛׂΓͯΔख๏ ɹɹˠ ઙ͍߹Ϗϯχϯάͱಉ༷ͷࢄԽ͕Ͱ͖ΔΜ͡Όͳ͍͔ max_depth=2 max_depth=8
©2020 Wantedly, Inc. 4. LightGBM (parameter tuning) ɹɾࢄԽ͢Δ΄Ͳscore͕ྑ͘ͳΔͷͰ, ߏΛۃྗγϯϓϧʹ͍ͨ͠ ɹɾtrainσʔλΛׂͯ͠࠷దͳύϥϝʔλΛݟ͚ͭΔ
ɹɾmax_depthΛҰ൪খ͘͞, lrΛۃྗେ͖ͨ͘͠ํ͕score͕ྑ͘ͳͬͨ Why we used LightGBM?
©2020 Wantedly, Inc. ɾsample weightͷઃఆ ɾhostͷ୯ޠΛΠϯϓοτͷઌ಄ྻʹஔ͘ ɾnew tokenͷՃ ɾBert-base casedΛ͏
ɾtexͷίʔυϒϩοΫΛྗٕͰফڈ Didn’t work for us
©2020 Wantedly, Inc. Discussion: https://www.kaggle.com/c/google-quest-challenge/discussion/129904#742302 Kernel: https://www.kaggle.com/shuheigoda/23th-place-solusion Links
©2020 Wantedly, Inc. https://www.wantedly.com/projects/375150 We are hiring !