Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SQLを活用したデータ分析におけるChatGPTの活用法

 SQLを活用したデータ分析におけるChatGPTの活用法

hikarut

May 20, 2023
Tweet

More Decks by hikarut

Other Decks in Technology

Transcript

  1. © )JLBSV5BLBIBTIJ  MJHIU ߴڮޫ )JLBSV5BLBIBTIJ ࣗݾ঺հ 1SPpMF ෳۀͰϓϩόεέοτϘʔϧϦʔάͷΫϥϒνʔϜͰϚʔέςΟϯάετϥςδετͱͯ͠ σʔλ෼ੳͷαϙʔτΛߦͬͨΓɺͦͷଞෳ਺ࣾͰ42-Λ׆༻ͨ͠σʔλ෼ੳͷαϙʔτܦ

    ݧ͋Γɻʰσʔλ෼ੳྗΛߴΊΔϏδωεύʔιϯͷͨΊͷ42-ೖ໳ʱͷஶॻɻ ୈճҙࢥܾఆͷͨΊͷσʔλ෼ੳษڧձʹͯʰҙࢥܾఆʹܨ͛ΔͨΊͷҼՌਪ࿦ʢ%*%ʣ ͱ1ZUIPOʹΑΔ$BVTBM*NQBDUʱͰൃදɻ ຊۀ ෳۀɾݸਓ׆ಈ ೥ʹ৽ଔͰϠϑʔגࣜձࣾʹೖࣾ͠ɺ8FCΤϯδχΞͱͯ͠αʔϏεͷ։ൃɺӡ༻ɺاըɺ ఏҊͳͲΛ୲౰ɻ೥͔Β͸גࣜձࣾΠʔϒοΫΠχγΞςΟϒδϟύϯʹग़޲͠ɺࣾ಺Ͱ σʔλαΠΤϯεάϧʔϓΛ্ཱͪ͛άϧʔϓϚωδϟʔ ෦௕ ͱͯࣾ͠಺ͷσʔλ׆༻Λଅਐɻ ೥͔Β֎ࢿܥίϯαϧςΟϯάاۀʹೖࣾ͠σʔλΛ׆༻ͨ͠ϚʔέςΟϯάࢧԉΛ࣮ࢪɻ
  2. © )JLBSV5BLBIBTIJ  σʔλ෼ੳͰҙࢥܾఆʹͭͳ͛ΔͨΊͷͭͷTUFQ σʔλ෼ੳʹ͓͚ΔͭͷTUFQ     

      TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ ՝୊ Ծઆ ૝ఆ ෼ੳ ݁Ռ ߟ࡯ ߦಈ w ͳͥ෼ੳΛ͢Δ ͷ͔ʢ8IZʣ w ͦ΋ͦ΋ͷ՝୊ Λઃఆ͢Δ w ෼ੳΛ͢Δ͜ͱ Ͱ໌Β͔ʹͨ͠ ͍͜ͱ͸Կ͔ɺ ͦͷͨΊʹͲΜ ͳԾઆ͕͋Δͷ ͔Λߟ͑Δ w ෼ੳͱ͸ԾઆΛ ݕূ͢Δ͜ͱ w ෼ੳ݁Ռ͕෼ ͔ͬͨΒͲΜͳ ҙࢥܾఆ͕Ͱ͖ Δ͔ʁ w ࣄલʹ૝ఆΞΫ γϣϯΛߟ͑ͯ ͓͘͜ͱͰ෼ੳ ͔Βҙࢥܾఆ͠ ΍͘͢͢Δ w ͲͷΑ͏ʹ෼ੳ Λߦ͏͔ʁ w ۩ମతͳ෼ੳख ஈɻ42-Λ࢖ͬ ͨ෼ੳ΍ 1ZUIPOΛ࢖ͬ ͨ෼ੳͳͲ w ࡍͷ෼ੳ݁Ռ͕ Ͳ͏ͩͬͨͷ ͔ʁ w ਺஋ͱͯ͠ͷ ϑΝΫτ৘ใΛ ·ͱΊΔ w ෼ੳ݁Ռ͔ΒͲ ͷΑ͏ͳղऍ͕ Ͱ͖Δͷ͔ɺͲ ΜͳఏҊ͕Ͱ͖ ͦ͏͔ʁ w ϑΝΫτ৘ใΛ ͲͷΑ͏ʹղऍ ͯ͠ɺͲΜͳࣔ ࠦΛग़͔͢ w ෼ੳ݁ՌΛड͚ ͯ࣍ͷΞΫγϣ ϯ͸Ͳ͏͢Δͷ ͔ʁ w ͲΜͳҙࢥܾఆ Λͨ͠ͷ͔ʢ͢ Δͷ͔ʣ
  3. © )JLBSV5BLBIBTIJ  σʔλ෼ੳʹ͓͚ΔͭͷTUFQɹ۩ମྫ σʔλ෼ੳʹ͓͚ΔͭͷTUFQ     

      TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ ՝୊ Ծઆ ૝ఆ ෼ੳ ݁Ռ ߟ࡯ ߦಈ w ೥͔Β&$ αΠτͷച্͕ Լ͕͍ͬͯΔɻ &$αΠτͷച্ Λ্͛ΔͨΊʹ Ͳ͏͢Ε͹ྑ͍ ͔ w ঎඼"Λߪೖ͠ ͍ͯΔϢʔβʔ ͸঎඼#΋Ұॹ ʹߪೖ͍ͯ͠Δ ͷͰ͸ͳ͍͔ w ঎඼"Λങͬͯ Δਓʹ঎඼#Λ ϝʔϧͰ͓קΊ ͢Δ w %8)͔Β42- Λ࢖ͬͯόε έοτ෼ੳΛ࣮ ࢪ w ঎඼"Λߪೖ͠ ͍ͯΔਓ͕ ਓ w ͦͷ͏ͪ঎඼# Λߪೖ͍ͯ͠Δ ਓ͕ਓ w ͦͷ͏ͪ঎඼$ Λߪೖ͍ͯ͠Δ ਓ͕ਓ w ঎඼"͸঎඼# ΑΓ΋঎඼$ͷ ํ͕ҰॹʹങΘ Ε͓ͯΓɺ঎඼ "ͱ঎඼$ͷ૬ ੑ͕Α͍ w ঎඼"Λߪೖ͠ ͯ঎඼$Λߪೖ ͍ͯ͠ͳ͍ਓʹ ঎඼$ͷׂҾ ΫʔϙϯΛϝʔ ϧͰ഑෍͢Δ͜ ͱΛܾఆ
  4. © )JLBSV5BLBIBTIJ  $IBU(15ʹΑͬͯஔ͖׵ΘΔ΋ͷɾஔ͖׵ΘΒͳ͍΋ͷ σʔλ෼ੳʹ͓͚ΔͭͷTUFQ     

      TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ TUFQ ՝୊ Ծઆ ૝ఆ ෼ੳ ݁Ռ ߟ࡯ ߦಈ º ˓ ˚ ˕ ˕ ˚ º ՝୊ͷઃఆ͸ਓ͕ߦ͏͕ɺ՝୊͕ܾ· Ε͹ԾઆΛग़͢͜ͱ΋$IBU(15Ͱ୅ ସՄೳ σʔλͷղऍ͸Ұൠ࿦ ͱͯ͠͸$IBU(15ͰҰ ఆͷߟ࡯ม͑ΒΕΔ͕ ҙࢥܾఆ͸ਓ͕ؒߦ͏ 1ZUIPO΍42-Λ࢖ͬͨ ۩ମతͳ෼ੳ͸ $IBU(15ʹΑͬͯ୅ସ ͞Ε΍͍͢
  5. © )JLBSV5BLBIBTIJ  42-Λ࢖ͬͨσʔλ෼ੳ σʔλ෼ੳʹ͓͚ΔͭͷTUFQʮ෼ੳɾ݁Ռʯ େاۀͳͲͰΑ͋͘ΔҰൠతͳσʔλ෼ੳ؀ڥ 42- 42- ෼ੳ݁Ռ σʔλ

    42-Λ࢖͏͜ͱͰͦͷ࣌ʑʹԠͯ͡ඞཁͳσʔλ Λૉૣ͘ूܭ͢Δ͜ͱ͕Ͱ͖Δ w 42-͕࢖͑Δͱ൚༻తʹσʔλ෼ੳʹ࢖͑Δ w ͔͠͠42-Λ࢖ͬͨ෼ੳ͸$IBU(15ʹஔ͖׵Θͬͯ͠·͏ʂʁ جװγεςϜ %8)ʢσʔλ΢ΣΞϋ΢εʣ #*πʔϧ
  6. © )JLBSV5BLBIBTIJ  $IBU(15ͷϓϩϯϓτ $IBU(15Λ׆༻ͨ͠σʔλ෼ੳʹ͓͚Δ42-ͷੜ੒ ࣍ͷ৚݅ʹै͍ɺ೥ͷ೥ؒͷϢʔβʔ͝ͱͷߪ ೖ݅਺ʹԠͯ͡).- ϔϏʔɺϛυϧɺϥΠτ ʹ෼ ྨ͠ɺ).-͝ͱͷϢʔβʔ਺Λूܭ͍ͯͩ͘͠͞ɻ

    ·ͨɺ͜ͷͱ͖Ϣʔβʔ৘ใ͕ۭͷσʔλͱΩϟϯη ϧͷ͋Δߪೖσʔλ͸আ֎͍ͯͩ͘͠͞ɻ ϔϏʔ೥ؒͷߪೖ݅਺͕݅Ҏ্ ϛυϧ೥ؒͷߪೖ݅਺͕݅Ҏ্ɺ݅ະຬ ϥΠτ೥ؒͷߪೖ݅਺͕݅ະຬ ˞ʰσʔλ෼ੳྗΛߴΊΔϏδωεύʔιϯͷͨΊͷ42-ೖ໳ʱୈষԋश໰୊ΑΓ ࣄલʹςʔϒϧͷεΩʔϚΛΠϯϓοτ ໰୊ʢϓϩϯϓτʣ &$αΠτͰ).-෼ੳΛߦ͏ྫ
  7. © )JLBSV5BLBIBTIJ  ਓ͕ॻ͍ͨ42-ͱ$IBU(15͕ॻ͍ͨ42-ͷൺֱ $IBU(15Λ׆༻ͨ͠σʔλ෼ੳʹ͓͚Δ42-ͷੜ੒ ᶃਓؒ ᶄ$IBU(15 ᶅ42-$3&"503 WITH hml_users

    AS( SELECT user_id, CASE WHEN COUNT(DISTINCT order_id) >= 500 THEN 'ϔϏʔ' WHEN COUNT(DISTINCT order_id) >= 300 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS hml FROM orders WHERE user_id IS NOT NULL AND order_date BETWEEN '2022-01-01' AND '2022-12-31' AND is_canceled = 0 GROUP BY user_id ) SELECT hml, COUNT(DISTINCT user_id) AS uu FROM hml_users GROUP BY hml SELECT CASE WHEN order_counts >= 500 THEN 'ϔϏʔ' WHEN order_counts >= 300 AND order_counts < 500 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS hml, COUNT(DISTINCT orders.user_id) AS user_count FROM orders JOIN ( SELECT user_id, COUNT(*) AS order_counts FROM orders JOIN products ON orders.order_product_id = products.product_id WHERE orders.is_canceled = 0 AND orders.order_date >= '2022-01-01' AND orders.order_date < '2023-01-01' GROUP BY user_id ) AS user_orders ON orders.user_id = user_orders.user_id WHERE orders.is_canceled = 0 AND orders.order_date >= '2022-01-01' AND orders.order_date < '2023-01-01' GROUP BY hml; SELECT CASE WHEN purchases_per_user.user_purchases >= 500 THEN 'ϔϏʔ' WHEN purchases_per_user.user_purchases >= 300 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS HML, COUNT(DISTINCT orders.user_id) AS user_count FROM orders INNER JOIN ( SELECT user_id, COUNT(*) AS user_purchases FROM orders WHERE orders.is_canceled = 0 AND orders.order_date BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY user_id ) AS purchases_per_user ON orders.user_id = purchases_per_user.user_id INNER JOIN users ON orders.user_id = users.user_id AND users.is_deleted = 0 GROUP BY HML;
  8. © )JLBSV5BLBIBTIJ  ਓ͕ॻ͍ͨ42-ͱ$IBU(15͕ॻ͍ͨ42-ͷൺֱ $IBU(15Λ׆༻ͨ͠σʔλ෼ੳʹ͓͚Δ42-ͷੜ੒ ᶃਓؒ ᶄ$IBU(15 ᶅ42-$3&"503 ˓ ˕

    ˚ w ݁Ռ͸ਖ਼ղ͕ͩগ͠42- ͕৑௕ w ༨෼ͳ৚͕݅ೖͬͯඍົ ʹҧ͏݁Ռ w 42-͕গ͠৑௕ w XJUI۟Λ࢖ͬͯՄಡੑͷ ߴ͍42-Λҙࣝ
  9. © )JLBSV5BLBIBTIJ  $IBU(15Ͱ42-Λग़ྗ͢Δ࣌ͷϙΠϯτ ϓϩϯϓτΛมߋͯ͠42-Λग़ྗ ࣍ͷ৚݅ʹै͍ɺ೥ͷ೥ؒͷϢʔβʔ͝ͱͷߪೖ݅਺ʹԠͯ͡).- ϔϏʔɺϛυϧɺϥΠτ ʹ ෼ྨ͠ɺ).-͝ͱͷϢʔβʔ਺Λूܭ͍ͯͩ͘͠͞ɻ·ͨɺ͜ͷͱ͖Ϣʔβʔ৘ใ͕ۭͷσʔλͱΩϟ ϯηϧͷ͋Δߪೖσʔλ͸আ֎͍ͯͩ͘͠͞ɻσʔλϕʔε͸42-JUFͰ͢ɻ·ͨɺ42-͸XJUI۟Λ࢖ͬ

    ͯՄಡੑͷߴ͍42-ʹ͍ͯͩ͘͠͞ɻ ϔϏʔ೥ؒͷߪೖ݅਺͕݅Ҏ্ ϛυϧ೥ؒͷߪೖ݅਺͕݅Ҏ্ɺ݅ະຬ ϥΠτ೥ؒͷߪೖ݅਺͕݅ະຬ ໰୊ʢϓϩϯϓτʣ ✔࢖͍ͬͯΔσʔλϕʔεΛࢦఆ͢Δʢ42-JUFɺ.Z42-ɺ1PTUHSF42-ͳͲʣ ✔ʮXJUI۟Λ࢖ͬͯʯͳͲՄಡੑΛߴΊΔΑ͏ʹ໌ࣔతʹࢦఆ͢Δ $IBU(15Ͱ42-Λग़ྗ͢Δ࣌ͷϙΠϯτ
  10. © )JLBSV5BLBIBTIJ  ଟগ͸վળ͞ΕΔ͕·ͩ৑௕ͳ42- $IBU(15Λ׆༻ͨ͠σʔλ෼ੳʹ͓͚Δ42-ͷੜ੒ ᶄ$IBU(15 ᶅ42-$3&"503 WITH -- 2022೥ʹ஫จ͞Εͨߪೖσʔλͷநग़

    order_2022 AS ( SELECT * FROM orders WHERE order_date BETWEEN '2022-01-01' AND '2022-12-31' AND is_canceled = 0 ), -- Ϣʔβʔ͝ͱͷ೥ؒߪೖ݅਺ͷܭࢉ user_purchase_count AS ( SELECT user_id, COUNT(DISTINCT order_id) AS purchase_count FROM order_2022 GROUP BY user_id ), -- ϢʔβʔΛHML͝ͱʹ෼ྨ͢Δ user_hml AS ( SELECT user_id, CASE WHEN purchase_count >= 500 THEN 'ϔϏʔ' WHEN purchase_count >= 300 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS hml FROM user_purchase_count ), -- HML͝ͱͷϢʔβʔ਺Λूܭ͢Δ hml_user_count AS ( SELECT hml, COUNT(DISTINCT user_id) AS user_count FROM user_hml GROUP BY hml ) -- ࠷ऴతͳग़ྗ݁Ռ SELECT hml, user_count FROM hml_user_count ORDER BY hml DESC WITH purchase_counts AS ( SELECT user_id, COUNT(*) AS cnt FROM orders WHERE is_canceled = 0 AND order_date BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY user_id ) SELECT CASE WHEN cnt >= 500 THEN 'ϔϏʔ' WHEN cnt >= 300 THEN 'ϛυϧ' ELSE 'ϥΠτ' END AS HML, COUNT(*) AS user_count FROM purchase_counts JOIN users ON purchase_counts.user_id = users.user_id WHERE users.is_deleted = 0 GROUP BY HML;
  11. © )JLBSV5BLBIBTIJ  (PPHMFͷੜ੒"*ʰ#BSEʱ 0OFNPSFUIJOH IUUQTCBSEHPPHMFDPN #BSE w (PPHMF͕։ൃͨ͠ੜ੒త"*Ͱɺ࠷৽ͷ"* ج൫Ϟσϧʮ1B-.ʯΛར༻

    w ೥݄೔࣌఺Ͱ΢ΣΠτϦετ͕ ͳ͘ͳΓΧࠃҎ্ͰӳޠͰͷར༻͕ ՄೳʹͳΓɺ೔ຊޠʹ΋ରԠ w ೥݄೔࣌఺Ͱ͸ʮࢼݧӡ༻൛ʯ
  12. © )JLBSV5BLBIBTIJ  (PPHMFͷੜ੒"*ʰ#BSEʱ 0OFNPSFUIJOH #BSE SELECT COUNT(DISTINCT user_id) AS

    user_count, CASE WHEN COUNT(order_id) >= 500 THEN 'ϔϏʔ' WHEN COUNT(order_id) >= 300 AND COUNT(order_id) < 500 THEN 'ϛυϧ' WHEN COUNT(order_id) < 300 THEN 'ϥΠτ' END AS category FROM orders WHERE user_id IS NOT NULL AND is_canceled = FALSE AND order_date BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY category; w ΤϥʔͰ࣮ߦ݁Ռ͕ฦͬͯ͜ͳ͍ w ͨͩ͠ɺ#BSE͸ෳ਺ͷճ౴͕ฦͬ ͯ͘ΔͷͰɺҰ෦ਖ਼ղ΋ฦͬͯ͘Δ º
  13. © )JLBSV5BLBIBTIJ  ·ͱΊ 4VNNBSZ σʔλ෼ੳʹ͸̓ͭͷTUFQ͕͋Γ$IBU(15ͳͲͷ"*ʹΑͬͯ୅ ସ͞ΕΔ෦෼ͱ͞Εͳ͍෦෼͕͋Δ  w ద੾ͳ໰͍ͷઃఆʢ՝୊ʣͱҙࢥܾఆʢߦಈʣ͸୅ସ͞Εͳ͍

    w 42-ͷੜ੒ͳͲ෼ੳ΍݁Ռͷநग़͸$IBU(15ʹΑͬͯ୅ସ͞ΕΔՄೳੑ͸͋Δ $IBU(15΍#BSEΛਖ਼͘͠׆༻͢ΔͨΊͷೳྗ͕ඞཁ  w ద੾ͳ໰͍Λઃఆͯ͠ϓϩϯϓτΛ౤͛Δٕज़ w ؒҧͬͨ42-͕ฦͬͯ͘Δ͜ͱ΋͋ΔͷͰɺग़ྗ͞Εͨ42-͕ਖ਼͍͔͠Λ൑அ͢Δ͚ͩͷೳྗ͕ඞཁ w ͢Ͱʹ஌ͬͯΔ΋ͷΛΑΓޮ཰తʹऔಘ͍ͨ͠৔߹ʹศར w ͦͷͨΊʹجຊతͳ42-ͳͲͷٕज़͸ֶΜͩํ͕Α͍ʢద੾ͳࢦࣔΛग़ͨ͢Ίɺ݁Ռ͕ਖ਼͍͔͠Λ൑அ͢Δ ͨΊʣ
  14. &/%