Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Kaggle Google Quest Q&A Labeling - 23th place s...
Search
Shuhei Goda
February 28, 2020
Technology
4
4.1k
Kaggle Google Quest Q&A Labeling - 23th place solution
Shuhei Goda
February 28, 2020
Tweet
Share
More Decks by Shuhei Goda
See All by Shuhei Goda
Turing × atmaCup #18 - 1st Place Solution
hakubishin3
0
870
ジョブマッチングサービスにおける相互推薦システムの応用事例と課題
hakubishin3
3
950
とある事業会社にとっての Kaggler の魅力
hakubishin3
8
2.5k
課題の解像度が荒かったことで意図した改善ができなかった話
hakubishin3
3
1k
Wantedly におけるマッチング体験を最大化させるための推薦システム
hakubishin3
4
1.2k
Recommendation Industry Talks #1 Opening
hakubishin3
1
390
会社訪問アプリ「Wantedly Visit」での シゴトに関する興味選択機能と推薦改善
hakubishin3
0
640
論文紹介: Improving Implicit Feedback-Based Recommendation through Multi-Behavior Alignment(Xin Xin et al., 2023)
hakubishin3
0
630
Feedback Prize - English Language Learning における擬似ラベルの品質向上の取り組み
hakubishin3
0
1k
Other Decks in Technology
See All in Technology
MCP で繋ぐ Figma とデザインシステム〜LLM を使った UI 実装のリアル〜
kimuson
2
1.3k
金融システムをモダナイズするためのAmazon Elastic Kubernetes Service(EKS)ノウハウ大全
daitak
0
120
Contract One Dev Group 紹介資料
sansan33
PRO
0
6k
GitHub Coding Agent 概要
kkamegawa
1
1.6k
Streamline Cloud-Native App Development Using CDEs
saeedzf
0
850
Introduction to Bill One Development Engineer
sansan33
PRO
0
240
うちの会社の評判は?SNSの投稿分析にAIを使ってみた
doumae
0
110
積み上げられた技術資産と向き合いながら、プロダクトの信頼性をどう守るか
plaidtech
PRO
0
890
Azure Developer CLI と Azure Deployment Environment / Azure Developer CLI and Azure Deployment Environment
nnstt1
1
120
S3 Tables を図解でやさしくおさらい~基本から QuickSight 連携まで/s3-tables-illustrated-basics-quicksight
emiki
1
330
OTel meets Wasm: プラグイン機構としてのWebAssemblyから見る次世代のObservability
lycorptech_jp
PRO
1
300
Oracle Database オプティマイザ・ヒントの活用
oracle4engineer
PRO
1
140
Featured
See All Featured
Visualization
eitanlees
146
16k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.6k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Raft: Consensus for Rubyists
vanstee
137
7k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
A designer walks into a library…
pauljervisheath
205
24k
The Pragmatic Product Professional
lauravandoore
35
6.7k
Faster Mobile Websites
deanohume
307
31k
Testing 201, or: Great Expectations
jmmastey
42
7.5k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
1
79
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
228
22k
Optimizing for Happiness
mojombo
378
70k
Transcript
©2020 Wantedly, Inc. 23th place solution Kaggle Google Quest Q&A
Labeling লձ Feb 28, 2020 - Shuhei Goda - @jy_msc
©2020 Wantedly, Inc. Team - The Hand Shuhei Goda @jy_msc
Visit Engineering Team at Wantedly Naomichi Agata @agatan_ People Engineering Team at Wantedly
©2020 Wantedly, Inc. Model Pipeline #FSUCBTF VODBTFE -JHIU(#. #FSUCBTF VODBTFE
Settings ɾ3fold with GroupKFold ɾBCE + margin ranking loss ɾ3epoch Settings ɾmax_depth=1 ɾlr=0.1 Meta features ɾtext length ɾstackexchange Text data ɾquestion_title ɾquestion_body ɾanswer 1SF1SPDFTT 2BOE" 1SF1SPDFTT POMZ2 ɾquestion_title ɾquestion_body ɾquestion_title ɾquestion_body ɾanswer Settings ɾhtml escape ɾhead+tail truncation
©2020 Wantedly, Inc. ɾHTMLจࣈྻͷΞϯΤεέʔϓ Pre-Process IUUQTXXXLBHHMFDPNDHPPHMFRVFTUDIBMMFOHFEJTDVTTJPO
©2020 Wantedly, Inc. ɾςΩετσʔλͷ݁߹ͱτϦϛϯά ɹɾ[CLS] + question_title + [SEP] +
question_body + [SEP] + answer ɾquestion_body ͱ answer ͕ࢦఆͷ͞Λ͑ͨ߹, ͔྆ΒಉαΠζΛτϦϛϯά Pre-Process IUUQTBSYJWPSHBCT
©2020 Wantedly, Inc. ɾBert-base (uncased) ɹɾޙΖ4ͭͷӅΕͷग़ྗΛ༻ https://arxiv.org/abs/1905.05583 ɹɾQAؒͷSEP tokenͷग़ྗΛ༻ Model
Architecture
©2020 Wantedly, Inc. ɾLabel weight ɹɾ؆୯ͦ͏ͳλεΫweightΛখ͘͞, ෆۉߧͰͦ͠͏ͳλεΫweightΛେ͖͘ ɹɾgpyoptͰweightͷ୳ࡧΛࢼͨ͠Έ͕ͨ, Լهͷ୯७ͳΓํ͕࠷ྑ͔ͬͨ Loss
function Label weight ͋Γ Public: 0.45979, Private: 0.41440 Label weight ͳ͠ Public: 0.43455, Private: 0.40602
©2020 Wantedly, Inc. ɾBCE + margin ranking loss (1 :
1) ɹɾϛχόονΛ2ͭʹׂͯ͠ margin ranking loss Λܭࢉ Loss function BCE + margin ranking loss Public: 0.45979, Private: 0.41440 BCE Public: 0.44006, Private: 0.40668
©2020 Wantedly, Inc. ɾQuestion Model ɹɾQ༻ͷλεΫΛQuestion text͚ͩΛͬͯղ͘ ɹɾΠϯϓοτQ͚ͩͰ͍͍ͷͰ, Qͷtruncationͷྔ͕ݮΔ (Qͷใྔ͕૿͑Δ)
Training Q model + Q and A model Public: 0.45979, Private: 0.41440 Q and A model × 2 (seed average) Public: 0.44298, Private: 0.40613
©2020 Wantedly, Inc. ɾLightGBM ɹɾmax_depth=1, lr=0.1 ɹɾmeta features ɹɹɾtext length
(question, answer) ɹɹɾmeta data from stackexchange (Score, View, FavoriteCount, …) Post-Process LightGBM Public: 0.45979, Private: 0.41440 Simple binning without meta features Public: 0.45282, Private: 0.41387
©2020 Wantedly, Inc. Why we used LightGBM? 1. Simple binning
method ɹɾ༧ଌΛࢄԽ͢Δ͜ͱͰ Spearman’s correlation ͕ྑ͘ͳΔ͜ͱʹؾͮ͘ ɹɾtarget͝ͱʹϏϯαΠζΛࣄલʹઃఆͯ͠Ϗϯೋϯά ɹɾϏϯαΠζݻఆʹ্ͨ͠ͰBertͷ֤epochͷग़ྗΛweighted average (weight࠷దԽ)
©2020 Wantedly, Inc. Why we used LightGBM? 2. Optimize bin-size
and weights ɹɾϏϯαΠζ࠷దͳΛ͍ͨ͘ͳͬͨ ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ্͕ͨ͠ख͍͔͘ͳ͍ ɹɾ࠷దͳϏϯαΠζ༧ଌͷܗʹΑܾͬͯ·Δ. ֤foldͷ࠷దͳϏϯαΠζͷฏۉͱ weighted averageޙͷ༧ଌ࠷దͳͷ͔Βဃ͢Δ
©2020 Wantedly, Inc. Why we used LightGBM? 3. LightGBM ɹɾϏϯαΠζͱweightsͷಉ࣌࠷దԽ͍ͨ͠
ɹɾmeta features͍͍ͨ ɹɾGBDTσʔλΛׂׂͯ͠ޙͷྖҬʹ࠷దͳΛׂΓͯΔख๏ ɹɹˠ ઙ͍߹Ϗϯχϯάͱಉ༷ͷࢄԽ͕Ͱ͖ΔΜ͡Όͳ͍͔ max_depth=2 max_depth=8
©2020 Wantedly, Inc. 4. LightGBM (parameter tuning) ɹɾࢄԽ͢Δ΄Ͳscore͕ྑ͘ͳΔͷͰ, ߏΛۃྗγϯϓϧʹ͍ͨ͠ ɹɾtrainσʔλΛׂͯ͠࠷దͳύϥϝʔλΛݟ͚ͭΔ
ɹɾmax_depthΛҰ൪খ͘͞, lrΛۃྗେ͖ͨ͘͠ํ͕score͕ྑ͘ͳͬͨ Why we used LightGBM?
©2020 Wantedly, Inc. ɾsample weightͷઃఆ ɾhostͷ୯ޠΛΠϯϓοτͷઌ಄ྻʹஔ͘ ɾnew tokenͷՃ ɾBert-base casedΛ͏
ɾtexͷίʔυϒϩοΫΛྗٕͰফڈ Didn’t work for us
©2020 Wantedly, Inc. Discussion: https://www.kaggle.com/c/google-quest-challenge/discussion/129904#742302 Kernel: https://www.kaggle.com/shuheigoda/23th-place-solusion Links
©2020 Wantedly, Inc. https://www.wantedly.com/projects/375150 We are hiring !