Upgrade to Pro — share decks privately, control downloads, hide ads and more …

toC企業でのデータ活用 (PyData.Okinawa + PythonBeginners沖縄 合同勉強会 2019)

takegue
June 15, 2019

toC企業でのデータ活用 (PyData.Okinawa + PythonBeginners沖縄 合同勉強会 2019)

takegue

June 15, 2019
Tweet

More Decks by takegue

Other Decks in Education

Transcript

  1. 8IP"N* ஛໺ फ़ีʢ @takegue ʣ Retty ← म࢜ʢNLP; ػց຋༁ʣˡ ߴઐ


    Core Value: Data Architect 
 σʔλͷՁ஋Λ࠷େԽ͢Δ࢓૊Έ/ઃܭͷ࣮ݱ
 
 ࣥච׆ಈ: 
 ʮ༏ઌ౓ֶशʹΑΔਪનจ͔Βͷݟग़͠நग़ʯ
 ʮ΍ͬͯΈΑ͏ʂ ػցֶशʢSotware Designʣʯ
 ʮࢼֶͯ͠Ϳ ػցֶशೖ໳ʯଞ…
 
 ߴ౓AIਓࡐ͔΋ʁ
 ͦͷଞ: https://shwca.se/takegue
  2. σʔλʹؔΘΔδϣϒɺ͍ΖΜͳδϣϒ͕͋Δ • Data Scientists • Data Infrastructure Engineer • ML

    Engineer / SysML • BI Engineer / Data Platform Engineer • Data Visualization Engineer / Data Analyst • Data Application Engineer
  3. σʔλʹؔΘΔλεΫɺ͍ΖΜͳλεΫ͕͋Δ ந৅త ۩ମత ΞϓϦ πʔϧ Ϩίϝϯυ ݕࡧ ίϯςϯπੜ੒ ଟݴޠରԠ ޿ࠂ

    ίϯςϯπ؂ࢹ ؂ࢹ (ҟৗݕ஌౳) ऩӹ༧ଌ ࣗಈQA ୳ࡧతσʔλ෼ੳ ϝτϦΫε։ൃ ύϑΥʔϚϯε෼ੳ Ծઆݕূ
  4. σʔλʹؔΘΔλεΫɺ͍ΖΜͳλεΫ͕͋Δ ந৅త ۩ମత ΞϓϦ πʔϧ Ϩίϝϯυ ίϯςϯπੜ੒ ଟݴޠରԠ ޿ࠂ ίϯςϯπ؂ࢹ

    ؂ࢹ (ҟৗݕ஌౳) ऩӹ༧ଌ ࣗಈQA ୳ࡧతσʔλ෼ੳ ϝτϦΫε։ൃ ύϑΥʔϚϯε෼ੳ Ծઆݕূ ݕࡧ
  5. ϓϩμΫτγϯΩϯά • ඼࣭ج४ͱͯ͠ʮ৴པʯΛଛͳΘͳ͍͔ʁ • ӕΛ͔ͭͳ͍͜ͱ • ޱޠతͰ͸ͳ͍ͳͲʮΒ͠͞ʯΛද͢બ޷ੑ͕͋Δ • αʔϏεͷڧΈʹͳΔΑ͏ͳ΋ͷ͕๬·͍͠ •

    Ωϟονίϐʔͱͯ͠ͷཱͪҐஔ; ັྗతͳจͰ͋Δ͜ͱ • ͋ͨΓ͞ΘΓͷͳ͍ฏۉతͳจষΛٻΊ͍ͯΔΘ͚Ͱ͸ͳ͍
 • ϓϩμΫτ΁ͷ౷߹ͷ਌࿨ੑ͕ߴ͍͜ͱ͕๬·͍͠ • จࣈ਺੍ݶͷ໰୊ (PC΍εϚϗ)
  6. αΠΤϯεγϯΩϯά • ັྗతʹײ͡Δจͱ͸Կ͔ʁ ◦ධՁͷઃܭ ▪ આಘྗ͕͋ΔΩϟονίϐʔ → CTR্͕Γͦ͏ʁ ▪ CTR͕͕͋ΔΩϟονίϐʔ

    ≠ આಘྗʁ
 • ΩϟονίϐʔΒ͍͠ͱ͸Կ͔ʁ ◦Ωϟονίϐʔͷྲྀெੑ ≠ จͱͯ͠ͷྲྀெੑ ◦จͱͯ͠͸ଟগ่Ε͍ͯͯ΋ྑ͍ʢϦζϜ͕͋Δͱྑ͍ʣ ◦ʮ͜ͷ͓ళͷεύήοςΟ͸ඒຯ͍͠Ͱ͢ʯ ◦ʮඒຯͳεύήοςΟΛఏڙʂʯ • Ωϟονίϐʔ͸ʮޱίϛʯͷཁ໿ͳͷ͔ • ͦ΋ͦ΋NLPͱͯ͠͸Ͳ͜·Ͱ͕Ͱ͖Δൣғͳͷ͔ʁ ◦ ػցతʹྲྀெͳจΛੜ੒͢Δ͜ͱ͸Ͱ͖Δ͔ʁ ▪ ػցֶशόοΫάϥ΢ϯυͱͯ͠ͷ஌ݟ ◦ Ͳ͏͍͏࣮ݧઃఆͩͬͨΒ͏·͘ਐΊΒΕΔ͔ʁ
  7. ΤϯδχΞϦϯάʹ͜ΕΒΛ౿·͑ͯͳΜͱ͔͢Δ   • ϓϩμΫτΠϯͷखؒ͸ʁ ◦ DBʹಥͬࠐΜͰͪΐͬͱίʔυΛॻ͖׵͑Δ͚ͩɺ͓खܰ؆୯ʂ ◦ ࢼߦࡨޡͷํʹ͕͖࣌ؒ͞΍͍͢ •

    ࠷ߴਫ਼౓ͷख๏͕ඞཁͳ༁Ͱ͸ͳ͍ ◦ ख๏ࣗମʹ৽نੑ͕ͳͯ͘ྑ͍ɻ஌ݟͷ৽نੑ͸ཉ͍͠ ◦ ࢼͯ͠ධՁͯ͠վળͰ͖Δ΋ͷ͕ྑ͍ ◦ ֶशʹ͕͔͔࣌ؒΔ௒େن໛ֶश͸࠷ॳ͸΍Βͳ͍ • ݱঢ়͋Δσʔληοτͷ೺Ѳ ◦ Ωϟονίϐʔͷจ਺͸͔ͳΓ͋Δ (20ສจڧ) ˍ ޱίϛ΋ͨ͘͞Μ͋Δʂ ◦ ੜ੒͢ΔͨΊʹ׬શʹ੔උ͞Εͨσʔληοτ͸ͳ͍ ˍ ୹ظܾઓ (1.0ϱ݄) ◦
  8. ྫ͑͹ … • ςϯϓϨʔτࢤ޲ ◦ ௒େྔͷൈ͚͕݀͋ΔςϯϓϨʔτΛ༻ҙ͠
 ٖࣅతʹେྔͷจΛੜ੒͠ɺͦ͜ͷத͔Βྑ͍΋ͷΛબͿ ▪ ΩϟονίϐʔͷݴޠϞσϧͰྲྀெੑ͸ධՁͰ͖Δʂ ▪

    ΩϟονίϐʔͷςϯϓϨʔτΛ͍͔ʹఏڙͰ͖Δ͔ʁ • ׬શจੜ੒ࢤ޲ ◦ GANGAN͍͜͏ͥʂ ◦ ΍ͬͨ͜ͱͳ͍͠ɺ΍ָͬͯͯͦ͠͏
 • ޱίϛཁ໿ࢤ޲ ◦ ޱίϛΛཁ໿੍ͯ͠ݶ͞ΕͨจࣈͰจΛͭ͘Δ
  9. Ͳ͏͔ͨ͠ʁ • ཁ໿ʢநग़ʣࢤ޲ͷΞϓϩʔνͱͯ͠໰୊ΛϞσϧԽ • ̎ͭͷจʹରͯ͠ɺࣄྫؒͷॱংؔ܎>= Λֶश͢Δ2஋෼ྨثΛߏங͢Δ໰୊ͱͯ͠ϞσϧԽ ɹɹɹ 
 ྑ͍ΩϟονίϐʔΛઈରతͳࢦඪͰܭଌ͢Δͷ͸೉͍͕͠ ɹɹɹ

    ૬ରతͳؔ܎͸؆୯ʹఆٛͰ͖Δɻ ɹɹɹ ɹɹɹॱংؔ܎͕ఆٛͰ͖Δͱιʔτ͕Ͱ͖Δʂ f(X1 , X2 ) = F(ϕ(X1 ) − ϕ(X2 ))) = { 1, if X1 ≥ X2 0, otherwise f(“͜ͷ͓ళͷຯḩो͸͏·͍”, “ࣗՈ੡ͷຯḩो͸͓;͘Ζͷຯʂ”ʣ
 = “͜ͷ͓ళͷຯḩो͸͏·͍” =< “ࣗՈ੡ͷຯḩो͸͓;͘Ζͷຯʂ"
  10. Ͳ͏͔ͨ͠ʁ • Ωϟονίϐʔͷจ >= ޱίϛ͔ΒϥϯμϜʹ੾Γग़ͨ͠จɹͰେྔͷٖࣅσʔλΛੜ੒ ɹɹ େྔͷ܇࿅ࣄྫˍग़ྗͷ࣍ݩ਺2Ͱ͋ΔͨΊɺֶशͰ͖ͦ͏ͳؾ͕͢Δ ΦϯϥΠϯߋ৽͕ՄೳͳϩδεςΟοΫճؼΛ෼ྨثʹར༻͢Δ͜ͱͰ
 ɹɹ σʔλྔʹରͯ͠΋໰୊ͳֶ͘शͰ͖ΔΑ͏ʹ

    (sklearn.linear_model.SGDClassifier Λར༻) ɹɹ Ұ؏ੑͷ͋Δσʔλྔ͕े෼ʹ֬อͰ͖Δͱ NNܥͷػցֶश͸ɺ͍͍ͩͨͲΜͳࣸ૾Ͱ΋Ͱ͖Δ ৄࡉ͸ׂѪ (http://www.orsj.or.jp/archive2/or62-11/or62_11_731.pdf)
  11. Ͳ͏͔ͨ͠ʁ ◦ ྑ͍ޱίϛ͔Βྑ͍Ωϟονίϐʔ͕͓ళʹ০ΒΕΔʂ • Ϣʔβͷޱίϛ͕͓ళΛԠԉ͢Δͱ͍͏ɺαʔϏεͷՁ஋؍ͱ΋Ϛον ◦ ॊೈੑ͕ߴ͍: ਪ࿦ϑΣʔζͷࡍͷจͷੜ੒ํ๏Λ޻෉͢Ε͹ɺ
 ৭ʑͳύλʔϯͰΩϟονίϐʔ͕ੜ੒Ͱ͖Δ ◦

    ղऍੑ΋ߴ͍: Ϟσϧ͕ͱͯ΋୯७ͳͨΊ ▪ ϩδεςΟοΫճؼͷಛ௃ྔͷॏΈΛ෼ੳ͢Ε͹ ▪ ୯ޠ-unigram: Ωϟονίϐʔʹ࢖ΘΕ΍͍͢ಛ௃తͳ୯ޠ͕Θ͔Δ ▪ ୯ޠ-ngram: จମֶ͕शͰ͖Δɻະ஌ޠॲཧΛߦ͏͜ͱͰςϯϓϨʔτ΋֫ಘͰ͖Δɻ ◦ ੜ੒͢ΔͷͰ͸ͳ͘ ධՁثΛ࡞͍ͬͯΔͷͰɺΦϖϨʔγϣϯʹରͯ͠਌࿨ੑ͕ߴ͍ ▪ Ϋϥ΢υιʔγϯάͰ͋Ε͹ɺॳֶऀͷ܇࿅ʹ࢖͑Δ ▪ ੒Ռ෺ͷϑΟϧλͱͯ͠ͷԠ༻΋ߟ͑ΒΕΔ
  12. ݁Ռ: Ͳ͏͍͏Ωϟονίϐʔ͕Ͱ͖Δ͔ʁ ࣾ಺ͰͷਓखධՁͰ͸ ఆྔతʹ΋ਓ͕ؒ࡞੒ͨ͠ΑΓ༗ҙʹྑ͍Ωϟονίϐʔ͕Ͱ͖Δ͜ͱ͕Θ͔ͬͨ શళฮͰ͸ແཧ͕ͩಛఆͷϑΟϧλΛ͔·ͤ͹ϓϩμΫτΠϯ΋Ͱ͖ͨ ◦ (ਓख) ౎಺࠷ڧͷ͏ͲΜ ◦ (ػց)

    ே͔Β൩·Ͱ௕ऄͷྻ͕Ͱ͖Δ໊ళ͏ͲΜ԰͞Μ ◦ (ػց) ೋށ࢈ͷͦ͹Λళ಺Ͱ੡ค͠ɺṢ͖ͨͯɾଧͪͨͯɾᣐͰͨͯͷʮ̏ͨͯʯͰఏڙ ◦ (ਓख) ͓ംͪΌΜͷՈʹ༡ͼʹདྷͨΑ͏ͳݹຽՈͰ௖͘ίγͷڧ͍͓ڶഴ͸ඒຯ
  13. ݁Ռ: Ͳ͏͍͏Ωϟονίϐʔ͕Ͱ͖Δ͔ʁ ࣾ಺ͰͷਓखධՁͰ͸ ఆྔతʹ΋ਓ͕ؒ࡞੒ͨ͠ΑΓ༗ҙʹྑ͍Ωϟονίϐʔ͕Ͱ͖Δ͜ͱ͕Θ͔ͬͨ શళฮͰ͸ແཧ͕ͩಛఆͷϑΟϧλΛ͔·ͤ͹ϓϩμΫτΠϯ΋Ͱ͖ͨ ◦ (ਓख) ౎಺࠷ڧͷ͏ͲΜ ◦ (ػց)

    ே͔Β൩·Ͱ௕ऄͷྻ͕Ͱ͖Δ໊ళ͏ͲΜ԰͞Μ ◦ (ػց) ೋށ࢈ͷͦ͹Λళ಺Ͱ੡ค͠ɺṢ͖ͨͯɾଧͪͨͯɾᣐͰͨͯͷʮ̏ͨͯʯͰఏڙ ◦ (ਓख) ͓ംͪΌΜͷՈʹ༡ͼʹདྷͨΑ͏ͳݹຽՈͰ௖͘ίγͷڧ͍͓ڶഴ͸ඒຯ
  14. toCྖҬͰͷσʔλ׆༻ʢػցֶशʣͷ஌ݟ   ྑ͍σʔλ͸໰୊Λγϯϓϧʹͯ͘͠ΕΔ ▪ ྑ͍໰୊ઃఆ͸ෳ਺ͷղܾΛ༩͑ͯ͘ΕΔ (Simple > Easy) ▪

    ʢαʔϏεʗۀքʗλεΫʣυϝΠϯಛ༗ͷಛԽ͢Δ͜ͱͰɺΑΓ໰୊ΛγϯϓϧʹͰ͖Δ Ұఆਫ४ͷ୲อʹͱͯ΋ۤ࿑͢Δ ◦ ϞσϧʙγεςϜͷ͏·͍ύΠϓϥΠϯͱͯ͠ͷઃܭྗ͕ࢼ͞ΕΔ ◦ ΞΧσϛοΫͰ͋Ε͹ ͻͱͭͣͭͰධՁɾղܾ͢Δෳ਺ͷ໰୊Λಉ࣌ʹղܾ͢Δඞཁ͕͋Δ ◦ Ωϟονίϐʔͷ৔߹͸ ྲྀெੑ / ৴པੑʢղऍੑʣ / ॊೈੑ Λಉ࣌ʹຬͨ͢ඞཁ͕͋ͬͨ ◦ ਓͷؒҧ͍ʹ͸ൺֱతڐ༰త͕ͩɺػցతͳؒҧ͍͸ඇڐ༰త ʢਓΈ͍ͨʹؒҧ͍͑ͨʣ A/BςετͷΑ͏ͳܗͰΠϯϋ΢εͳධՁ͕ར༻Ͱ͖ΔΞυόϯςʔδ ◦ αʔϏεಛ༗ͷ݁Ռʹͳͬͯ͠·͏ͨΊɺଞͷαʔϏεʹ͓͍ͯͷ࠶ݱੑ͸୲อͰ͖ͳ͍͕…
  15. ͍͔ʹσʔλ׆༻Λߦ͏͔ʁ ղ͚Δ໰୊ͷ
 ೉қ౓ ղ͘΂͖໰୊ͷ
 ඼࣭ NNͷ୆಄ ਅͷGOAL ཧ૝ ͜͜ʹ͍Δͭ΋Γʁ ࣮ࡍ͸͔ͬͪ͜΋ʁ

    from https://simplystatistics.org/2019/04/17/tukey-design-thinking-and-better-questions/ ΪϟοϓΛຒΊΔ ྑ͍໰͍Λߟ͑Δඞཁ͕͋Δ
  16. ڊਓͨͪͷಈ͖ʹண໨ͯ͠ΈΔ - Netflix: Πϕϯτ৘ใ - ϩέʔϧ৘ใ - ࢪࡦ൪߸৘ใ - ϦϦʔε൪߸৘ใ

    - Ͳͷίϯςϯπ͕ݟΒΕͨ - Ͳͷίϯςϯπ͕දࣔ͞Ε͔ͨ - ͲͷλΠϛϯάͰ
 ίϯςϯπ͕ಈ͍͔ͨ …