Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Target Encoding はなぜ有効なのか
Search
Shuhei Goda
November 30, 2019
Technology
11k
12
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Target Encoding はなぜ有効なのか
分析コンペLT会
https://kaggle-friends.connpass.com/event/154881/
Shuhei Goda
November 30, 2019
More Decks by Shuhei Goda
See All by Shuhei Goda
Turing × atmaCup #18 - 1st Place Solution
hakubishin3
0
1.3k
ジョブマッチングサービスにおける相互推薦システムの応用事例と課題
hakubishin3
3
1.2k
とある事業会社にとっての Kaggler の魅力
hakubishin3
9
3.2k
課題の解像度が荒かったことで意図した改善ができなかった話
hakubishin3
3
1.1k
Wantedly におけるマッチング体験を最大化させるための推薦システム
hakubishin3
4
1.4k
Recommendation Industry Talks #1 Opening
hakubishin3
1
470
会社訪問アプリ「Wantedly Visit」での シゴトに関する興味選択機能と推薦改善
hakubishin3
0
780
論文紹介: Improving Implicit Feedback-Based Recommendation through Multi-Behavior Alignment(Xin Xin et al., 2023)
hakubishin3
0
720
Feedback Prize - English Language Learning における擬似ラベルの品質向上の取り組み
hakubishin3
1
1.1k
Other Decks in Technology
See All in Technology
Agent Skills設計で柔軟性と硬さのバランスが難しい話
nassy20
0
140
手塩にかけりゃいいってもんじゃない
ming_ayami
0
610
200個のGitHubリポジトリを横断調査したかった
icck
0
130
データサイエンスを価値につなげるプロジェクト設計 〜 DS一年目が現場で得た気づき 〜
ysd113
1
280
iAEONの段階的リアーキテクト戦略 / iAEON's_Gradual_Re-architecture_Strategy
aeonpeople
0
230
不要なレビューをAIにまかせて AIコーディングの環境改善を加速した
shoota
1
220
2026TECHFRESH畢業分享會 - Lightning Talk - 資料也要 CI/CD? 用 Airbyte 自動化資料同步
line_developers_tw
PRO
0
1.3k
OTel × Datadog で 「AI活用」を計測し、改善に繋げる
shihochan
1
370
MUSUBI 田中裕一『AIと共に行う「しごとのリデザイン」- スモールバックオフィス編』AI Ops Lab #4
musubi
0
250
2026TECHFRESH畢業分享會 - Lightning Talk - E起 See See : 電商推薦讀心術? 數據說了算
line_developers_tw
PRO
0
1.3k
AIはどのように 組織のアジリティを変えるのか?
junki
4
1k
新しいUbuntu/GNOMEが使いたいからXからWaylandへ移行頑張ってるの巻 2026-06-20
nobutomurata
0
150
Featured
See All Featured
Making Projects Easy
brettharned
120
6.7k
コードの90%をAIが書く世界で何が待っているのか / What awaits us in a world where 90% of the code is written by AI
rkaga
62
44k
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
430
Un-Boring Meetings
codingconduct
0
320
Heart Work Chapter 1 - Part 1
lfama
PRO
7
36k
Kristin Tynski - Automating Marketing Tasks With AI
techseoconnect
PRO
0
270
Abbi's Birthday
coloredviolet
2
8.1k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.7k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
2k
Reality Check: Gamification 10 Years Later
codingconduct
0
2.2k
First, design no harm
axbom
PRO
2
1.2k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
320
Transcript
©2019 Wantedly, Inc. Target Encodingͳͥ༗ޮͳͷ͔ ੳίϯϖLTձ Nov 30, 2019 -
Shuhei Goda - @jy_msc
©2019 Wantedly, Inc. Self-Introduction •Shuhei Godaʢ߹ా पฏʣ •Wantedly, Inc. (since
Sep 2019) •Recommendation Team https://www.wantedly.com/projects/375150 Kaggle Master hakubishinͱ͍͏໊લͰ twitter͍ͬͯ·͢ @jy_msc We are hiring !
©2019 Wantedly, Inc. ɾTarget Encodingͳͥ༗ޮͳͷ͔ ɾKaggleͰͷఆ൪ख๏ͷ1ͭ ɾLabel EncodingͰͳ͘Target EncodingΛͨ͠ํ͕ྑ͍߹͕͋Δ ɾͳͥTarget
Encoding͕ྑ͍݁ՌΛग़͢ͷ͔, ͦͷཧ༝Λઆ໌͍ͯ͠Δࢿྉ͕͋ ·ΓݟͨΒͳ͍ ɾTarget Encoding͕༗ޮͰ͋Δཧ༝ʹ͍ͭͯ, ࣗͳΓͷղऍΛհ About Talk
©2019 Wantedly, Inc. ɾతมΛ༻͍ͯΧςΰϦมΛʹม͢Δख๏ ɾΧςΰϦมΛ֤ਫ४ʹ͓͚ΔతมͷظͰஔ͢Δ ɾҰൠతʹ, ਫ४͕ଟ͍΄Ͳߴ͍ޮՌ͕ظ͞ΕΔ Target Encodingͱ Target
EncodingΛѻ͏্Ͱͷҙ࣮ํ๏ KaggleຊͰ֬ೝ͍ͯͩ͘͠͞ !
©2019 Wantedly, Inc. ɾϞσϧΛ୯७Խͤ͞ΔΑ͏ͳޮՌΛ࣋ͭ ɹɹɾҎ߱, GBDTΛྫʹߟ͑ͯΈΔ ͳͥ༗ޮͳͷ͔
©2019 Wantedly, Inc. ɾҎԼͷΑ͏ͳσʔλΛͬͯઆ໌͢Δ ɹɹɾతม y ࿈ଓ ɹɹɾઆ໌ม x ਫ४4ͷΧςΰϦม
x = {A, B, C, D} ɹɹɹɾE[y|x=A]=60, E[y|x=B]=20, E[y|x=C]=50, E[y|x=D]=10 ༻͢Δαϯϓϧσʔλ
©2019 Wantedly, Inc. GBDTͷ෮श σʔληοτ: Ճ๏Ϟσϧ: ଛࣦؔ: mຊͷͷ༿ͷweight, ͷ༿ͷ, ͷΛද͢
D = {(xi , yi )}n i=1 (xi ∈ Rm, yi ∈ R) ̂ yi = ΣM m=1 fm (xi ) = ΣM m=1 wm (xi ) L = Σn i=1 l( ̂ yi , yi ) + ΣM m=1 Ω(fm ) (Ω(f ) = γT + 1 2 λ∥w∥2) wm (x) T M
©2019 Wantedly, Inc. GBDTͷ෮श ͕mຊͷ࣌ͷଛࣦؔ: , j൪ͷ༿ʹׂΓͯΒΕͨσʔλू߹ , m-1ຊ·Ͱͷ༧ଌ݁ՌʹΑΔҰ֊ඍͱೋ֊ඍ gradient:
, hessian: L(m) = Σn i=1 l(yi , ̂ yi + fm (xi )) + Ω(fm ) ≃ Σn i=1 [gi fm (xi ) + 1 2 hi fm (xi )] + γT + 1 2 λΣT j=1 w2 j = ΣT j=1 [(Σi∈Ij gi )wj + 1 2 (Σi∈Ij hj + λ)w2 j + γT Ij gi , hi gi = ∂l(yi , ̂ y(m−1) i ) ∂ ̂ y(m−1) i hi = ∂2l(yi , ̂ y(m−1) i ) (∂ ̂ y(m−1) i )2
©2019 Wantedly, Inc. GBDTͷ෮श αϯϓϧׂ͕ΓৼΒΕͨ࣌ͷ༿ͷ࠷దͳweight Ͱ͋Γ, ͦͷ࣌ͷଛࣦ αϯϓϧΛׂͨ࣌͠ͷଛࣦͷݮΓํΛΈͯ, nodeຖʹ࠷దͳׂΛ୳͍ͯ͘͠ gain:
w* j = − Σi∈Ij gi Σi∈Ij hi L(m) = − 1 2 ΣT j=1 (Σi∈Ij gi )2 Σi∈Ij hj + λ + γT Lbef − (Laf,left + Laf,right ) " # $ % $ % " # Lbef Laf,left Laf,right gain (ׂલޙͷlossͷࠩ) ͕ େ͖͍΄Ͳྑׂ͍
©2019 Wantedly, Inc. GBDTͷ෮श ଛࣦ͕ؔ MSE ͷ߹ ଛࣦؔ: gradient: ,
hessian: ΑΓ ༿ j ͷ weight , ༿ j ʹׂΓͯΒΕͨαϯϓϧͷࠩฏۉͱͳΔ l(yi , ̂ yi ) = 1 2 (yi − ̂ yi )2 gi = ∂l(yi , ̂ y(m−1) i ) ∂ ̂ y(m−1) i = ̂ y(m−1) i − yi hi = ∂2l(yi , ̂ y(m−1) i ) (∂ ̂ y(m−1) i )2 = 1 w* j = − Σi∈Ij gi Σi∈Ij hi = − Σi∈Ij ( ̂ y(m−1) i − yi ) Σi∈Ij 1 ࠩ(ਅ - m-1ຊ࣌ͷ༧ଌ)ͷ૯ αϯϓϧͷ
©2019 Wantedly, Inc. GBDTͷઃఆ ɾγϯϓϧͳϞσϧͰߟ͑ͯΈΔ. ɹɾloss_func = ‘MAE' ɹɾeta =
1 → εςοϓαΠζ ɹɾiteration = 1 → ࠷ॳͷ͚ͩߟ͑Δ ɹɾtree_method = ‘exact’ → ۪ʹશ୳ࡧ ɹɾbase_score = 0 → ॳظ0ελʔτ ɹɾlambda = 0 ɹɾgamma = 0
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ ɾΧςΰϦมΛΞϧϑΝϕοτॱʹLabel Encoding ɾಛྔͷେ͖͞ͰαϯϓϧΛιʔτ͢Δ ൵͍͠άϥϑʜ ιʔτ
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=1) w* left w* left
w* left w* right w* right w* right L1 = − 48797 L2 = − 56913 L2 = − 49783 L2 = − 57093 L2,left = − 35522 L2,right = − 21391 L2,left = − 31832 L2,right = − 17951 L2,left = − 56097 L2,right = − 996 " # $ %
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=1) " # $ %
w* left w* left w* left w* right w* right w* right L1 = − 48797 L2 = − 56913 L2 = − 49783 L2 = − 57093 L2,left = − 35522 L2,right = − 21391 L2,left = − 31832 L2,right = − 17951 ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏ L2,left = − 56097 L2,right = − 996
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=2) L2 = − 56097
L3 = − 60111 L3 = − 56769 w* left w* right w* left w* right L3,left = − 35522 L3,right = − 24589 L3,left = − 31832 L3,right = − 24937 " # $ % % " # $ L1 = − 48797 L2,left = − 56097 L2,right = − 996
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=2) L2 = − 56097
" # $ % % " # $ L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3 = − 60111 L3 = − 56769 w* left w* right w* left w* right L3,left = − 35522 L3,right = − 24589 L3,left = − 31832 L3,right = − 24937 ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=3) L3 = − 24589
L4 = − 29013 w* left w* right L4,left = − 4076 L4,right = − 24937 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=3) L3 = − 24589
L4 = − 29013 w* left w* right L4,left = − 4076 L4,right = − 24937 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏
©2019 Wantedly, Inc. Label EncodingΛͬͨ߹ (depth=3) " # $ %
% " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 $ # L4,left = − 4076 L4,right = − 24937 ׂऴΘΓ
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ ɾΧςΰϦมΛTarget Encoding ɾಛྔͷେ͖͞ͰαϯϓϧΛιʔτ͢Δ ιʔτ
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=1) L1 = − 48797
L2,left = − 996 L2,right = − 56097 L2,left = − 4551 L2,right = − 59992 L2,left = − 21391 L2,right = − 35522 w* left w* right w* left w* right w* left w* right L2 = − 57093 L2 = − 64543 L2 = − 56913 " # $ %
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=1) L1 = − 48797
L2,left = − 996 L2,right = − 56097 L2,left = − 4551 L2,right = − 59992 L2,left = − 21391 L2,right = − 35522 w* left w* right w* left w* right w* left w* right ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏ L2 = − 57093 L2 = − 64543 L2 = − 56913 " # $ %
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=2) " # $
% " $ # % L2,left = − 4551 L1 = − 48797 L2,left = − 4551 L2,right = − 59992 L2,right = − 59992 w* right w* left L′ 3,left = − 24937 L′ 3,right = − 35522 L3 = − 60459 w* right w* left L3,left = − 996 L3,right = − 4076 L3 = − 5072
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=2) " # $
% " $ # % L2,left = − 4551 L1 = − 48797 L2,left = − 4551 L2,right = − 59992 L2,right = − 59992 w* right w* left L′ 3,left = − 24937 L′ 3,right = − 35522 L3 = − 60459 w* right w* left L3,left = − 996 L3,right = − 4076 L3 = − 5072 ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏ ͜͜Ͱׂ͢Δͷ͕ྑͦ͞͏
©2019 Wantedly, Inc. Target EncodingΛͬͨ߹ (depth=2) " # $ %
" $ # % L1 = − 48797 L2,left = − 4551 L2,right = − 59992 # % " $ L′ 3,left = − 24937 L′ 3,right = − 35522 L3,left = − 996 L3,right = − 4076 ׂऴΘΓ
©2019 Wantedly, Inc. Label Encoding ͱ Target Encoding ͷൺֱ "
# $ % " $ # % L1 = − 48797 L2,left = − 4551 L2,right = − 59992 # % " $ L′ 3,left = − 24937 L′ 3,right = − 35522 L3,left = − 996 L3,right = − 4076 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 $ # L4,left = − 4076 L4,right = − 24937 Label EncodingͰ࡞ͬͨߏ Target EncodingͰ࡞ͬͨߏ
©2019 Wantedly, Inc. (͔ͳΓዞҙతͳྫͰ͕ͨ͠) Target Encodingͷํ͕গ͠ޮྑͦ͞͏͡Όͳ͍Ͱ͔͢ʁ
©2019 Wantedly, Inc. Target EncodingԿΛͯ͘͠Ε͍ͯΔͷ͔ ɾߏΛΑΓγϯϓϧʹͳΔ ɾଛࣦ͕ؔMSEͰ࢝ΊͷํͷiterationͰ, ࠩ(gradient) ͷେ͖͕͞ ͍ۙਫ४ಉ࢜ΛΑΓ͍ۙҐஔʹஔ͢ΔΑ͏ͳޮՌΛ࣋ͭ.
→ׂ͞Εͨαϯϓϧ܈, ͦΕͧΕൺֱత͍ۙࠩΛ࣋ͭͷͰֶशޮ ͕ྑ͍
©2019 Wantedly, Inc. ΑΓਫ४͕૿͍͑ͯ͘ͱ ɾTarget EncodingͷޮՌਫ४͕૿͑Δ΄Ͳ࣮ײ͍͢͠ ɾࣄલʹ, ࠩͷେ͖͞ͰΧςΰϦΛιʔτͨ͠ํׂ͕ͷޮ͕ྑ͍.
©2019 Wantedly, Inc. ΑΓਫ४͕૿͍͑ͯ͘ͱ ɾTarget EncodingͷޮՌਫ४͕૿͑Δ΄Ͳ࣮ײ͍͢͠ ɾࣄલʹ, ࠩͷେ͖͞ͰΧςΰϦΛιʔτͨ͠ํׂ͕ͷޮ͕ྑ͍. w* right
w* left w* right w* left
©2019 Wantedly, Inc. શͯͷਫ४Λׂ͠Δ·Ͱʹඞཁͳਂ͞ ɾTarget Encodingͷํ͕ਂ͕͞ઙ͍, ΑΓߏ͕γϯϓϧʹ ɾҎԼਫ४100ͷΧςΰϦมΛׂͯ͠Έͨ࣌ͷߏ Label Encoding
Target Encoding
©2019 Wantedly, Inc. ֤ਂ࣌͞Ͱͷlossͷݮগྔ ɾTarget Encodingͷํ͕ޮతʹlossΛݮগ͍ͤͯ͞Δ ɾਫ४͕ଟ͍΄Ͳ, Label Encodingͱͷ͕ࠩେ͖͘ͳ͍ͬͯ͘.
©2019 Wantedly, Inc. ਂ͞ / iteration Λ૿͍͚ͯ͠Ϟσϧ͕ྑ͠ͳʹͯ͘͠ΕΔΜ͡Όͳ͍ʁ ɾ໌Β͔ʹྑ͍ͱΘ͔͍ͬͯΔใ໌ࣔతʹϞσϧʹͨ͠ํ͕ྑ͍ ɾLabel EncodingͰԿͱ͔ͯ͘͠ΕΔ͔͠Εͳ͍͕,
Ϟσϧ͕ෳࡶʹ ͳΓ͍͢. ਫ४͕૿͍͑ͯ͘΄Ͳ, ͦΕݱ࣮తͰͳ͍. ɾܦݧ্, ໌Β͔ʹޮ͘ͱ͔͍ͬͯΔͷֶशͷલஈ֊ͰରԠͨ͠ํ ͕ྑ͍. ɾಛྔͷinteractionͱಉ͡
©2019 Wantedly, Inc. ɾTarget EncodingʹΑͬͯ, Ϟσϧ͕ΑΓγϯϓϧʹͳΔ ɾଛࣦ͕ؔMSEͰ࢝ΊͷํͷiterationͰ, ࠩͷେ͖͍ॱʹιʔτ͢Δ͜ͱ ͰޮతͳׂΛ࣮ݱ͢Δ͜ͱ͕Ͱ͖Δ. ɾਫ४͕૿͑Δ΄Ͳ,
Target EncodingͷޮՌ͕େ͖͘ͳΔ ɾLabel encodingͰTarget encodingͱಉͷ͜ͱΛΔͨΊʹ͋Δఔͷਂ͞ ͕ඞཁͰ, ͦΕਫ४͕૿͑Δ΄Ͳݱ࣮తͰͳ͍. ɾTarget EncodingͤͣͱϞσϧଆͰimplicitʹͰ͖Δ͔͠Εͳ͍͕, ໌Β͔ʹ ྑ͍ͱΘ͔͍ͬͯΔͷϞσϧʹೖΕΔલʹରԠͨ͠ํ͕ྑ͍. Summary