Upgrade to Pro — share decks privately, control downloads, hide ads and more …

一人でも小さく始められるGoogle Cloudで実現するほぼサーバレスなデータ基盤 / Serverless Dataplatform for Google Cloud

一人でも小さく始められるGoogle Cloudで実現するほぼサーバレスなデータ基盤 / Serverless Dataplatform for Google Cloud

Jagu'e'r データ利活用分科会LTスライド

Shinichi Nakagawa

December 02, 2022
Tweet

More Decks by Shinichi Nakagawa

Other Decks in Technology

Transcript

  1. ҰਓͰ΋খ࢝͘͞ΊΒΕΔ Google CloudͰ࣮ݱ͢Δ ΄΅αʔόϨεͳσʔλج൫ ㅟ ㅟ ㅟ ㅟ ㅟ ㅟ

    ㅟ ⚾؍ઓΛศརʹ͢ΔͨΊͷʮݸਓతͳDXʯઓུͱͦͷ࣮૷ Shinichi Nakagawa 2022/12/02 Jagu’e’rσʔλར׆༻෼Պձ #8
  2. Who am I ? • Shinichi Nakagawaʢத઒৳Ұʣ • ΞΫηϯνϡΞגࣜձࣾ 


    ςΫϊϩδʔίϯαϧςΟϯάຊ෦Ϛωδϟʔ • Ҏલͷ࢓ࣄ: ελʔτΞοϓ, ϝΨϕϯνϟʔͷΤϯδχΞ • ΞΫηϯνϡΞͰ͸Google Cloudؔ࿈ͷσϦόϦʔ • ݸਓͱͯ͠͸ҎԼͷ໨తͰϓϩμΫτ։ൃʢ㲈झຯʣ • ໺ٿσʔλ෼ੳɾղੳ • ࣗ෼ࣗ਎ͷϔϧεέΞ • ্هΛςʔϚʹٕͨ͠ज़ݕূ • ਪ͠ͷGoogle Cloud: BigQuery, Cloud Run • ਪ͠ͷBaseball Human: ৽ঙ߶ࢤ, ສ೾தਖ਼
  3. ϝδϟʔϦʔάͷϏοάσʔλ • ϝδϟʔϦʔά͸ʮStatcastʯͱ͍͏γεςϜͰ৭ΜͳσʔλΛه࿥͍ͯ͠·͢. 
 ※ݪଇΧϝϥɾϨʔμʔͱ͍ͬͨܭଌػثͰه࿥ʢҰ෦ਓྗͰͷه࿥ɾਪଌ஋ΛؚΉʣ • ྫ͑͹, ࣮گɾղઆͷݩωλ͸͢΂ͯ͜ͷʮStatcastʯͱ͍͏Ϗοάσʔλ͕ݩωλʹͳ͍ͬͯ·͢. • ΦΦλχαϯʂ˓߸ຊྥଧʂଧٿ଎౓180km/h,

    ඈڑ཭130m • ΦΦλχαϯʂ162km/hͷਅͬ௚͙Ͱݟಀ͠ࡾৼʂʂʂ • ໺ٿͷҰڍखҰ౤଍, ͢΂ͯͷ౤ٿɾଧٿσʔλ͕ه࿥͞ΕΔ. • ϨΪϡϥʔγʔζϯʢ30νʔϜɾ162ࢼ߹ʣͰ͓͓Αͦ70ʙ80ສٿલޙ. ϙετγʔζϯɾय़Ωϟϯϓσʔλ΋͋Δ. • σʔλ͸91ݸͷ߲໨ʢ!?ʣͰߏ੒͞ΕΔ, ϨΪϡϥʔγʔζϯ෼Ͱ͓͓Αͦ400MBʙ600MB͙Β͍ͷσʔλ. • baseballsavant.mlb.com ͱ͍͏αΠτͰ୭Ͱ΋Ӿཡɾμ΢ϯϩʔυʢCSV ϑΥʔϚοτʣͰ͖·͢.
  4. 2022೥ͷΦΦλχαϯ, εϥΠμʔͱ2γʔϜ, ΧοτϘʔϧܑ͞ΜʹͳΔ • ࠓ೥ͷΦΦλχαϯ, ΊͬͪΌ 
 εϥΠμʔ౤͍͛ͯΔ • ͓ؾ͖ͮͩΖ͏͔?ޙ൒ઓ͸

    
 2γʔϜʢσʔλ্͸Sinkerʣ͕ 
 ૿͍͑ͯΔ͜ͱʹ!? • εϥΠμʔ, 2γʔϜ, ΧοτϘʔϧͰ 
 บ͕ڧ͍ۂ͕Γٿ౤͛ΔϚϯʹΩϟϥม ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022 ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022
  5. This presentation makes reference to marks owned by third parties.

    Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  6. ΞʔΩςΫνϟղઆʢ㲈ͩ͜ΘΓϙΠϯτʣ • ຖ೔σʔλ֬ೝɾຖ೔σʔλߋ৽Λ͍͍ײ͡ʹ࣮ݱ͢ΔͨΊ, 
 ʮϑϧϚωʔδυͳαʔόϨεܥΫϥ΢υαʔϏεʯΛશ໘తʹ׆༻ͯ͠ߏஙɾӡ༻. • αʔϏεબఆͷجຊํ਑ • ʮDWH͸BigQueryʯΛى఺ʹ, ETLͱΞϓϦέʔγϣϯຊମΛઃܭʢBQΛ࢖͍͍͔ͨΒʣ

    • ֤ίϯϙʔωϯτ͸ϚΠΫϩαʔϏεͱͯ͠ಠཱͤ͞Δ, ࡞Γ΍͍͢ɾςετ͠΍͍͢ͷͰ. • ҰͭҰͭͷཁૉ͸খ͍͞ΞϓϦͳͷͰ, Cloud Funcions or Cloud RunͰߏஙɾӡ༻ • GitHub Actions౳ͷCI/CDͷύΠϓϥΠϯʹ૊ΈࠐΜͰσϓϩΠɾεέʔϧͰ͖ͨΓ 
 جຊతʹ͸ʮ࢖ͬͨ෼͚ͩ՝ۚʯʹͳΔͷͰ͓ࡒ෍ʹ΋༏͍͠ʢ݄͋ͨΓ$5લޙʣ👛
  7. • ΞϓϦຊମ͸Cloud RunͰϗεςΟϯά, ΞϓϦຊମ͸Dashͱ͍͏PythonͷFrameworkͰ࣮૷ • API GatewayΛ௨ͯ͠BackendʢCloud FunctionsʣʹΞΫηε. Backend͸Functions FrameworkͰ࡞ͬͨRESTful

    API • Database͸Firestore, ޙʹ঺հ͢ΔETLͰBigQuery͔ΒETLͯ͠ߏங μογϡϘʔυΞϓϦ This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  8. σʔλऩू&BigQueryอଘ • σʔλݩαΠτʢBaseball Savantʣ͔Βఆظతʹσʔλऩू͢ΔΫϩʔϥʔʢCloud Functionsʣ࣮ߦ • ࣮ߦ݁Ռ͸Google Cloud StorageʢGCSʣʹCSVͱͯ͠อଘ. ͜Ε͕ݯઘͷσʔλʢDatalakeʣ

    • GCS্ͷCSVΛαϚϦʔ͍͍ͯ͠ײ͡ʹͯ͠BigQueryʹอଘ͢ΔPySparkεΫϦϓτΛDataproc Serverless্Ͱ࣮ߦ This presentation makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  9. Firestore౤ೖʢDatabaseʹσʔλҠૹʣ • BigQueryσʔλΛμογϡϘʔυ༻σʔλͷܗࣜʢJSONʣʹม׵͢ΔPySparkεΫϦϓτΛDataproc Serverless্Ͱ࣮ߦ • ࣮ߦ݁ՌʢGCS্ʹJSONܗࣜͰอଘʣΛFirestoreʹೖΕΔͨΊͷPythonεΫϦϓτΛ࣮ߦ • DataprocͱFirestoreͷॲཧ͸खݩͷεΫϦϓτΛΛखಈ࣮ߦʢ׬શࣗಈԽΛ્֐͢Δ੍໿͕͋ͬͨͨΊʣ This presentation

    makes reference to marks owned by third parties. Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  10. ӡ༻ͯ͠ͷৼΓฦΓ • ΞϓϦଆ͸Cloud Run & Cloud FunctionsͰ͍͍ײ͡ʹӡ༻Ͱ͖ͯΔ👏 
 અ໿ͷͨΊϦιʔεΛίʔϧυελϯόΠͷঢ়ଶͰӡ༻͍ͯ͠Δ͕ 


    ݸਓར༻ͳͷͰࢧো͸ͳ͍ʢ͔ͭinstance͸CIճͯ͠૿ݮͰ͖Δߏ੒ʣ • σʔλଆ͸ධՁ͕෼͔ΕΔ • Cloud FunctionsͱSchedulerͰͷϐλΰϥεΠονͳσʔλॲཧ͸˕ • BigQuery΋͍͍ײ͡, ूܭॲཧͳͲ΋ετϨεແ͘ߦ͚͍ͯΔ̋ • Dataproc serverlessΛࠓճͷن໛Ͱ࢖͏ͷ͸৑௕͔ͩͬͨ΋͠Εͳ͍ʁ
  11. This presentation makes reference to marks owned by third parties.

    Unless otherwise noted, all such third-party marks are the property of their respective owners. No sponsorship, endorsement or approval of this content by the owners of such marks is intended, expressed or implied.
  12. ࠓޙ΍Ζ͏ͱࢥ͍ͬͯΔ͜ͱ • ETL͸Cloud Functions & Cloud SchedulerͷϐλΰϥεΠονʹ౷Ұ • Dataproc͸࣮ࡍ࢖ͬͨ݁Ռ, ࣗ෼ͷϢʔεέʔεʹ͸too

    much • γϯϓϧͳCSV͔ͭσʔλྔ΋গͳ͍ͷͰCloud FunctionsͰॲཧՄೳ • σʔλऩूɾॲཧϑϩʔͷ׬શࣗಈԽ, IaCʹΑΔΠϯϑϥ؅ཧ • BigQueryΛ࢖֤ͬͨछ౷ܭσʔλͷॆ࣮Խ • ࠓ͸΄΅ੜσʔλΛΫΤϦͯ͠Δ͚ͩʢඞཁʹԠͯ͡viewΛ࡞ΔͳͲʣ • Spark΋࢖͑Δ͠΋͏ͪΐͬͱؾͷར͍ͨDatamartΛॆ࣮ͤ͞Δ
  13. ߨԋͷ·ͱΊ • ϝδϟʔϦʔάʹ͸Φʔϓϯσʔλ͕͋Γɺ 
 େ୩ᠳฏબखͳͲͷύϑΥʔϚϯε͕ 
 ֬ೝɾධՁͰ͖·͢ɻ • Φʔϓϯσʔλͷ෼ੳɾՄࢹԽΛ೔ৗతʹ 


    ׆༻͢ΔͨΊɺGoogle CloudͰ 
 σʔλ෼ੳج൫Λ࡞Γ·ͨ͠ɻ • αʔόϨεɾΞʔΩςΫνϟͷΈͰ 
 σʔλج൫ߏஙɾӡ༻͸࣮ݱՄೳɺ 
 ڧ͘Φεεϝ͍͖͍ͯͨ͠Ͱ͢ɻ ग़య: https://speakerdeck.com/shinyorke/pythonshi-inotamenosupotudetajie-xi-nokihon-pysparktomeziyarigudetawotian-ete-number-pyconjp-2022