Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Target Encoding はなぜ有効なのか

Avatar for Shuhei Goda Shuhei Goda
November 30, 2019

Target Encoding はなぜ有効なのか

Avatar for Shuhei Goda

Shuhei Goda

November 30, 2019
Tweet

More Decks by Shuhei Goda

Other Decks in Technology

Transcript

  1. ©2019 Wantedly, Inc. Self-Introduction •Shuhei Godaʢ߹ా पฏʣ •Wantedly, Inc. (since

    Sep 2019) •Recommendation Team https://www.wantedly.com/projects/375150 Kaggle Master hakubishinͱ͍͏໊લͰ twitter΍͍ͬͯ·͢ @jy_msc We are hiring !
  2. ©2019 Wantedly, Inc. ɾTarget Encoding͸ͳͥ༗ޮͳͷ͔ ɾKaggleͰͷఆ൪ख๏ͷ1ͭ ɾLabel EncodingͰ͸ͳ͘Target EncodingΛͨ͠ํ͕ྑ͍৔߹͕͋Δ ɾͳͥTarget

    Encoding͕ྑ͍݁ՌΛग़͢ͷ͔, ͦͷཧ༝Λઆ໌͍ͯ͠Δࢿྉ͕͋ ·Γݟ౰ͨΒͳ͍ ɾTarget Encoding͕༗ޮͰ͋Δཧ༝ʹ͍ͭͯ, ࣗ෼ͳΓͷղऍΛ঺հ About Talk
  3. ©2019 Wantedly, Inc. ɾҎԼͷΑ͏ͳσʔλΛ࢖ͬͯઆ໌͢Δ ɹɹɾ໨తม਺ y ͸࿈ଓ஋ ɹɹɾઆ໌ม਺ x ͸ਫ४਺4ͷΧςΰϦม਺

    x = {A, B, C, D} ɹɹɹɾE[y|x=A]=60, E[y|x=B]=20, E[y|x=C]=50, E[y|x=D]=10 ࢖༻͢Δαϯϓϧσʔλ
  4. ©2019 Wantedly, Inc. GBDTͷ෮श σʔληοτ: Ճ๏Ϟσϧ: ଛࣦؔ਺: ͸mຊ໨ͷ໦ͷ༿ͷweight, ͸໦ͷ༿ͷ਺, ͸໦ͷ਺Λද͢

    D = {(xi , yi )}n i=1 (xi ∈ Rm, yi ∈ R) ̂ yi = ΣM m=1 fm (xi ) = ΣM m=1 wm (xi ) L = Σn i=1 l( ̂ yi , yi ) + ΣM m=1 Ω(fm ) (Ω(f ) = γT + 1 2 λ∥w∥2) wm (x) T M
  5. ©2019 Wantedly, Inc. GBDTͷ෮श ໦͕mຊ໨ͷ࣌ͷଛࣦؔ਺: ͸, j൪໨ͷ༿ʹׂΓ౰ͯΒΕͨσʔλू߹ ͸, m-1ຊ໨·Ͱͷ༧ଌ݁ՌʹΑΔҰ֊ඍ෼ͱೋ֊ඍ෼ gradient:

    , hessian: L(m) = Σn i=1 l(yi , ̂ yi + fm (xi )) + Ω(fm ) ≃ Σn i=1 [gi fm (xi ) + 1 2 hi fm (xi )] + γT + 1 2 λΣT j=1 w2 j = ΣT j=1 [(Σi∈Ij gi )wj + 1 2 (Σi∈Ij hj + λ)w2 j + γT Ij gi , hi gi = ∂l(yi , ̂ y(m−1) i ) ∂ ̂ y(m−1) i hi = ∂2l(yi , ̂ y(m−1) i ) (∂ ̂ y(m−1) i )2
  6. ©2019 Wantedly, Inc. GBDTͷ෮श αϯϓϧׂ͕ΓৼΒΕͨ࣌ͷ༿ͷ࠷దͳweight͸ Ͱ͋Γ, ͦͷ࣌ͷଛࣦ஋͸ αϯϓϧΛ෼ׂͨ࣌͠ͷଛࣦͷݮΓํΛΈͯ, nodeຖʹ࠷దͳ෼ׂΛ୳͍ͯ͘͠ gain:

    w* j = − Σi∈Ij gi Σi∈Ij hi L(m) = − 1 2 ΣT j=1 (Σi∈Ij gi )2 Σi∈Ij hj + λ + γT Lbef − (Laf,left + Laf,right ) " #  $ % $ % " # Lbef Laf,left Laf,right gain (෼ׂલޙͷlossͷࠩ) ͕ େ͖͍΄Ͳྑ͍෼ׂ
  7. ©2019 Wantedly, Inc. GBDTͷ෮श ଛࣦؔ਺͕ MSE ͷ৔߹ ଛࣦؔ਺: gradient: ,

    hessian: ΑΓ ༿ j ͷ weight ͸, ༿ j ʹׂΓ౰ͯΒΕͨαϯϓϧͷ࢒ࠩฏۉͱͳΔ l(yi , ̂ yi ) = 1 2 (yi − ̂ yi )2 gi = ∂l(yi , ̂ y(m−1) i ) ∂ ̂ y(m−1) i = ̂ y(m−1) i − yi hi = ∂2l(yi , ̂ y(m−1) i ) (∂ ̂ y(m−1) i )2 = 1 w* j = − Σi∈Ij gi Σi∈Ij hi = − Σi∈Ij ( ̂ y(m−1) i − yi ) Σi∈Ij 1 ࢒ࠩ(ਅ஋ - m-1ຊ໨࣌఺ͷ༧ଌ஋)ͷ૯࿨ αϯϓϧͷ਺
  8. ©2019 Wantedly, Inc. GBDTͷઃఆ ɾγϯϓϧͳϞσϧͰߟ͑ͯΈΔ. ɹɾloss_func = ‘MAE' ɹɾeta =

    1 → εςοϓαΠζ ɹɾiteration = 1 → ࠷ॳͷ໦͚ͩߟ͑Δ ɹɾtree_method = ‘exact’ → ۪௚ʹશ୳ࡧ ɹɾbase_score = 0 → ॳظ஋͸0ελʔτ ɹɾlambda = 0 ɹɾgamma = 0
  9. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=1) w* left w* left

    w* left w* right w* right w* right L1 = − 48797 L2 = − 56913 L2 = − 49783 L2 = − 57093 L2,left = − 35522 L2,right = − 21391 L2,left = − 31832 L2,right = − 17951 L2,left = − 56097 L2,right = − 996 " # $ %
  10. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=1) " # $ %

    w* left w* left w* left w* right w* right w* right L1 = − 48797 L2 = − 56913 L2 = − 49783 L2 = − 57093 L2,left = − 35522 L2,right = − 21391 L2,left = − 31832 L2,right = − 17951 ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏ L2,left = − 56097 L2,right = − 996
  11. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=2) L2 = − 56097

    L3 = − 60111 L3 = − 56769 w* left w* right w* left w* right L3,left = − 35522 L3,right = − 24589 L3,left = − 31832 L3,right = − 24937 " # $ % % " # $ L1 = − 48797 L2,left = − 56097 L2,right = − 996
  12. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=2) L2 = − 56097

    " # $ % % " # $ L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3 = − 60111 L3 = − 56769 w* left w* right w* left w* right L3,left = − 35522 L3,right = − 24589 L3,left = − 31832 L3,right = − 24937 ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏
  13. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=3) L3 = − 24589

    L4 = − 29013 w* left w* right L4,left = − 4076 L4,right = − 24937 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589
  14. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=3) L3 = − 24589

    L4 = − 29013 w* left w* right L4,left = − 4076 L4,right = − 24937 " # $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏
  15. ©2019 Wantedly, Inc. Label EncodingΛ࢖ͬͨ৔߹ (depth=3) " # $ %

    % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 $ # L4,left = − 4076 L4,right = − 24937 ෼ׂऴΘΓ
  16. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=1) L1 = − 48797

    L2,left = − 996 L2,right = − 56097 L2,left = − 4551 L2,right = − 59992 L2,left = − 21391 L2,right = − 35522 w* left w* right w* left w* right w* left w* right L2 = − 57093 L2 = − 64543 L2 = − 56913 " # $ %
  17. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=1) L1 = − 48797

    L2,left = − 996 L2,right = − 56097 L2,left = − 4551 L2,right = − 59992 L2,left = − 21391 L2,right = − 35522 w* left w* right w* left w* right w* left w* right ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏ L2 = − 57093 L2 = − 64543 L2 = − 56913 " # $ %
  18. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=2) " #  $

    % " $ # % L2,left = − 4551 L1 = − 48797 L2,left = − 4551 L2,right = − 59992 L2,right = − 59992 w* right w* left L′ 3,left = − 24937 L′ 3,right = − 35522 L3 = − 60459 w* right w* left L3,left = − 996 L3,right = − 4076 L3 = − 5072
  19. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=2) " #  $

    % " $ # % L2,left = − 4551 L1 = − 48797 L2,left = − 4551 L2,right = − 59992 L2,right = − 59992 w* right w* left L′ 3,left = − 24937 L′ 3,right = − 35522 L3 = − 60459 w* right w* left L3,left = − 996 L3,right = − 4076 L3 = − 5072 ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏ ͜͜Ͱ෼ׂ͢Δͷ͕ྑͦ͞͏
  20. ©2019 Wantedly, Inc. Target EncodingΛ࢖ͬͨ৔߹ (depth=2) " # $ %

    " $ # % L1 = − 48797 L2,left = − 4551 L2,right = − 59992 # % " $ L′ 3,left = − 24937 L′ 3,right = − 35522 L3,left = − 996 L3,right = − 4076 ෼ׂऴΘΓ
  21. ©2019 Wantedly, Inc. Label Encoding ͱ Target Encoding ͷൺֱ "

    #  $ % " $ # % L1 = − 48797 L2,left = − 4551 L2,right = − 59992 # % " $ L′ 3,left = − 24937 L′ 3,right = − 35522 L3,left = − 996 L3,right = − 4076 " #  $ % % " # $ # $ " L1 = − 48797 L2,left = − 56097 L2,right = − 996 L3,left = − 35522 L3,right = − 24589 $ # L4,left = − 4076 L4,right = − 24937 Label EncodingͰ࡞ͬͨ໦ߏ଄ Target EncodingͰ࡞ͬͨ໦ߏ଄
  22. ©2019 Wantedly, Inc. ਂ͞ / iteration Λ૿΍͍͚ͯ͠͹Ϟσϧ͕ྑ͠ͳʹͯ͘͠ΕΔΜ͡Όͳ͍ʁ ɾ໌Β͔ʹྑ͍ͱΘ͔͍ͬͯΔ৘ใ͸໌ࣔతʹϞσϧʹ౉ͨ͠ํ͕ྑ͍ ɾLabel EncodingͰ΋Կͱ͔ͯ͘͠ΕΔ͔΋͠Εͳ͍͕,

    Ϟσϧ͕ෳࡶʹ ͳΓ΍͍͢. ਫ४਺͕૿͍͑ͯ͘΄Ͳ, ͦΕ͸ݱ࣮తͰ͸ͳ͍. ɾܦݧ্, ໌Β͔ʹޮ͘ͱ෼͔͍ͬͯΔ΋ͷ͸ֶशͷલஈ֊ͰରԠͨ͠ํ ͕ྑ͍. ɾ਺஋ಛ௃ྔͷinteractionͱಉ͡࿩
  23. ©2019 Wantedly, Inc. ɾTarget EncodingʹΑͬͯ, Ϟσϧ͕ΑΓγϯϓϧʹͳΔ ɾଛࣦؔ਺͕MSEͰ࢝ΊͷํͷiterationͰ͸, ࢒ࠩͷେ͖͍ॱʹιʔτ͢Δ͜ͱ Ͱޮ཰తͳ෼ׂΛ࣮ݱ͢Δ͜ͱ͕Ͱ͖Δ. ɾਫ४਺͕૿͑Δ΄Ͳ,

    Target EncodingͷޮՌ͕େ͖͘ͳΔ ɾLabel encodingͰTarget encodingͱಉ౳ͷ͜ͱΛ΍ΔͨΊʹ͸͋Δఔ౓ͷਂ͞ ͕ඞཁͰ, ͦΕ͸ਫ४਺͕૿͑Δ΄Ͳݱ࣮తͰͳ͍. ɾTarget Encodingͤͣͱ΋ϞσϧଆͰimplicitʹͰ͖Δ͔΋͠Εͳ͍͕, ໌Β͔ʹ ྑ͍ͱΘ͔͍ͬͯΔ΋ͷ͸ϞσϧʹೖΕΔલʹରԠͨ͠ํ͕ྑ͍. Summary