Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ruby で作るデータ分析基盤

Altech
July 14, 2018

Ruby で作るデータ分析基盤

Rails Developers Meetup 2018 Day 3 Extreme https://techplay.jp/event/679666

Altech

July 14, 2018
Tweet

More Decks by Altech

Other Decks in Programming

Transcript

  1. ©2018 Wantedly, Inc. 4PIFJ5BLFOP !"MUFDI@  ೥d8BOUFEMZ  8BOUFEMZ7JTJUͷ։ൃΛ೥ؒ୲౰ 

    άϩʔεɺݕࡧɾਪનɺσʔλج൫ɺ
 ։ൃج൫ͳͲ  ݱࡏ͸8BOUFEMZ1FPQMFͷόοΫΤϯυΛ։ൃ  ޷͖ͳϓϩάϥϛϯάݴޠ͸3VCZ ࣗݾ঺հ
  2. ©2018 Wantedly, Inc. ⾣ ϢʔβʔͷߦಈΛݩʹΞϓϦέʔγϣϯ ͷಈ͖Λม͍͑ͨ w FHॏཁͳίϯόʔδϣϯɾεςοϓΛ௨ա͍ͯ͠Δ͔൱͔ w FHίϯόʔδϣϯʹࢸΔ·Ͱͷࡉ͔͍εςοϓ͝ͱͷέΞ

    w ʮ☓☓Λ/ճݟ͍ͯͨΒʯ w ػցֶशʹΑΔ࠷దԽ ⾣ Ϣʔβʔʹ෼ੳػೳΛఏڙ͍ͨ͠ w FHʮࡢ೔ͷ17ͱɺ͜Ε·Ͱͷ17ͷ߹ܭʯ w FHʮଞͱൺֱ͢ΔͨΊͷ17ϥϯΩϯάʯ ͳͥσʔλ෼ੳΛ͍͔ͨ͠ 
  3. ©2018 Wantedly, Inc. ⾣ lϦϨʔγϣφϧσʔλϕʔεʹϩάΛอଘz w %#༰ྔΛѹഭɺू໿ܭࢉ΋͠ʹ͘͘த్൒୺ͳଘࡏʹ ⾣ lதؒσʔλશମΛूܭ͢ΔΫΤϦΛॻ͖ɺ3BJMTΩϟογϡz w

    ࣌ؒͷܦաʜதؒσʔλͩͱࢥ͍ͬͯͨ΋ͷࣗମ͕ॏ͘ͳΔ w Ϣʔβʔ਺ͷ૿Ճʜಉ࣌ΞΫηε਺͕૿Ճ͠ɺ3BJMTΩϟογϡ ͕ඈΜͩλΠϛϯάͰͷ࠶ܭࢉ͕ಉ࣌ฒߦͰ૸Δ w ্هͷֻ͚߹ΘͤͰ%#͕ϝϞϦΛ࢖͍Ռͨ͢ࣄ݅΋ ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ 
  4. ©2018 Wantedly, Inc. ⾣ σʔλ΢ΣΞϋ΢εར༻ͷଅਐ w 5SFBTVSF%BUB೥d #JH2VFSZ೥d ⾣ ૊৫͕େ͖͘ͳΓɺαʔϏε͕੒௕͢Δʹ࿈Εͯҙࢥܾఆͷ

    ͨΊͷσʔλ෼ੳ͕૿͑Δ w ଟ਺ͷ෼ੳΫΤϦΛ8FC6*Ͱ؅ཧ ୭͕Ͳ͏͍͏ҙਤͰ࡞ΓͲ͏มߋ͞Ε͔ͨ෼͔Βͳ͍ ΫΤϦ͕ؒҧ͍ͬͯͯ൑அΛޡΔ ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ 
  5. ©2018 Wantedly, Inc. ⾣ #JH2VFSZͷΫΤϦͷ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ ࢖͍ํ  export do table

    :daily_page_views columns [:day, :pv] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, '+09:00') day, COUNT(*) AS pv FROM `log.accesses*` WHERE _TABLE_SUFFIX = FROMAT_TIMESTAMP(”%Y%m%d”, TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY) ) SQLn @85@4$)&%6-&%@5*.&δϣϒͷ࣮ߦ࣌ؒΛදٙ͢ࣅม਺
  6. ©2018 Wantedly, Inc. ⾣ δϣϒϑΝΠϧΛ௥Ճ͠ɺQVMMSFRVFTUʹ͢Δ ⾣ Ϛʔδ͢Δͱɺδϣϒͱͯ͠ొ࿥͞Εఆظ࣮ߦ͞ΕΔ ⾣ $-*͔Β೚ҙͷ೔ͷδϣϒΛ࣮ߦՄೳ ⾣

    ೚ҙͷ3VCZίʔυΛ࣮ߦ͢ΔSVOOFS ࢖͍ํ  $ ./bin/job run daily_page_view -s '2018-07-14' run :proc, -> (scheduled_time) { # … return [[:foo, 1], [:bar, 2]] }
  7. ©2018 Wantedly, Inc. ⾣ ʮࡢ೔ΠϯϑϥϨϕϧͰෆ۩߹͕͋ͬͨͷͰʓʓؔ܎ͷδϣϒ Λػցతʹ࠶࣮ߦ͢Δʯ͕Մೳ͔ʁ ⾣ ΠϯλʔϑΣΠεɾϨϕϧͰ࠶࣮ߦՄೳੑΛଅਐ w FYQPSUͷࡍ͸VQEBUF͕σϑΥϧτ

    w SVO͸֎෦͔Β࣮ߦ࣌ؒΛҾ਺Λ༩͑Δ w TDIFEVMJOH͸EBJMZͷΑ͏ͳେ·͔ͳࢦఆ͕σϑΥϧτ ⾣ δϣϒϑΝΠϧ͸3VCZΦϒδΣΫτͱͯ͠ϩʔυ͞ΕΔͷͰɺ ϓϩάϥϚϒϧʹ΋ॲཧͰ͖Δ δϣϒͷ඼࣭ jobs = Analytics::Backend::JobLoader.load_all; daily_jobs = jobs.select {|job| job.schedule&.frequency == :daily } daily_jobs.each do |job| job.execute(scheduled_time: Time.new(2018,1,10), export: true) end
  8. ©2018 Wantedly, Inc. ⾣ ΫϥελϦϯά͞Εͨෳ਺ͷϚγϯ্Ͱ࣮ߦ͞ΕΔ
 ʢޮ཰Խɾ৑௕Խʣ ⾣ ,VCFSOFUFTͷίϚϯυͰొ࿥δϣϒͷҰཡ΍࣮ߦεςʔλ ε͕͋Δఔ౓෼͔Δ δϣϒ࣮ߦج൫ͱͯ͠ͷ,VCFSOFUFTͷ׆༻

    export do table :daily_page_views columns [:day, :pv] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, '+09:00') day, COUNT(*) AS pv FROM `log.accesses*` WHERE _TABLE_SUFFIX = Ruby ϑΝΠϧ apiVersion: batch/v1beta1 kind: CronJob metadata: name: visit--user-impressed-companies labels: namespace: visit basename: user_impressed_companies role: job namespace: analytics spec: schedule: "20 6,21 * * *" concurrencyPolicy: "Replace" suspend: false successfulJobsHistoryLimit: 10 failedJobsHistoryLimit: 3 jobTemplate: metadata: name: visit--user-impressed-companies labels: namespace: visit basename: user_impressed_companies role: job spec: backoffLimit: 5 template: metadata: name: visit--user-impressed-companies labels: YAML ϑΝΠϧ Kubernetes HFOFSBUF BQQMZ SVOBGUFSNFSHFE
  9. ©2018 Wantedly, Inc. ⾣ ܧଓతʹ࢖ΘΕଓ͚͍ͯΔ w ೥݄ݱࡏɺδϣϒ ⾣ ఆظతͳػೳ֦ு w

    #JH2VFSZ#JH2VFSZ΁ͷॻ͖ग़͠ w JNQPSUKPC3%##JH2VFSZͷ&5- ಋೖޙ
  10. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷͨΊͷίʔυϕʔεΛ෼཭ͯ͠࢓૊ΈΛ࡞ͬͨ w ࢖͍΍͍͢*'ͷఏڙ w σϓϩΠ࣌ؒͷ୹ॖ w

    ϫʔΫϑϩʔͷ౷Ұ w ΑΓ҆શͳΞʔΩςΫνϟ΁ͷҠߦ w δϣϒͷ඼࣭޲্ w ֦ுՄೳͳ࢓૊Έ ⾣ ݁Ռͱͯ͠ɺ෼ੳܥΛ3BJMTΞϓϦ͔Β੾Γ཭͢͜ͱʹ੒ޭ ⾣ ґଘؔ܎ͷ؅ཧͳͲࠓޙཉ͘͠ͳΔՄೳੑͷ͋Δػೳ͸͋Δ ·ͱΊ