Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DigdagでETL処理をする
Search
tosametal
July 19, 2019
Technology
0
4.1k
DigdagでETL処理をする
データとML周辺エンジニアリングを考える会 #2
https://data-engineering.connpass.com/event/136756/
#data_ml_engineering
tosametal
July 19, 2019
Tweet
Share
More Decks by tosametal
See All by tosametal
マイクロアドのアドテクを支える技術
tosametal
1
190
Qiita Career Meetup for Server Side Engineers
tosametal
4
4.2k
Other Decks in Technology
See All in Technology
QAエンジニアがプロダクト専任で チームの中に入ると。。。?/登壇資料(杉森 太樹)
hacobu
PRO
0
210
AIエージェントによるエンタープライズ向けスライド検索!
shibuiwilliam
1
160
なぜThrottleではなくDebounceだったのか? 700並列リクエストと戦うサーバーサイド実装のすべて
yoshiori
12
4k
Proxmox × HCP Terraformで始めるお家プライベートクラウド
lamaglama39
1
190
クレジットカードの不正を防止する技術
yutadayo
16
7.2k
re:Invent完全攻略ガイド
junjikoide
1
310
QAセントラル組織が運営する自動テストプラットフォームの課題と現状
lycorptech_jp
PRO
0
390
Rubyist入門: The Way to The Timeless Way of Programming
snoozer05
PRO
6
400
明日から真似してOk!NOT A HOTELで実践している入社手続きの自動化
nkajihara
1
200
AI時代におけるドメイン駆動設計 入門 / Introduction to Domain-Driven Design in the AI Era
fendo181
0
680
[mercari GEARS 2025] Keynote
mercari
PRO
0
210
CodexでもAgent Skillsを使いたい
gotalab555
9
4.5k
Featured
See All Featured
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Thoughts on Productivity
jonyablonski
73
4.9k
Code Reviewing Like a Champion
maltzj
527
40k
Designing Experiences People Love
moore
142
24k
Typedesign – Prime Four
hannesfritz
42
2.9k
Optimizing for Happiness
mojombo
379
70k
Fireside Chat
paigeccino
41
3.7k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
The Power of CSS Pseudo Elements
geoffreycrofte
80
6.1k
Java REST API Framework Comparison - PWX 2021
mraible
34
9k
Transcript
DigdagͰETLॲཧΛ͢Δ σʔλͱMLपลΤϯδχΞϦϯάΛߟ͑Δձ #2 2019.07.19 தᠳଠ(@tosametal) גࣜձࣾϚΠΫϩΞυ ΞϓϦέʔγϣϯΤϯδχΞ
ϚΠΫϩΞυʹ͓͚Δػցֶश ࠂ৴γεςϜʹ͓͚ΔCTR༧ଌɺCVR༧ଌɺෆਖ਼ΫϦοΫͷݕग़ͳͲ
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫)
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫) at least once ϢχʔΫͳIDʹΑΔॏෳഉআ sessionͰཧ ႈͳॲཧ Kafka secondaryͰ kafkaΛࢦఆ jsonܗࣜͷ ߏԽσʔλ
Digdagͱ digϑΝΠϧʹએݴతʹϫʔΫϑϩʔΛهड़ Workflow as code εέδϡʔϧ࣮ߦɺϦΧόϦ UI͔Βਐḿͷ֬ೝ࠶࣮ߦ͕Մೳ ΦϖϨʔλΛࣗ࡞Մೳ
PostgreSQL ࣮ߦཤྺͳͲΛอଘ Task͝ͱʹhadoopΫϥΠΞϯτ ͱͳΔίϯςφΛ্ཱͪ͛Δ εέʔϧΞτՄೳ όον࣮ߦج൫ߏ
ෳࡶͳґଘؔΛ੍ޚͭͭ͠ ϫʔΫϑϩʔͷՄಡੑΛอͭ
ϓϩδΣΫτΛػೳ୯ҐͰׂ ϓϩδΣΫτͱ In Digdag, workflows are packaged together with other
files used in the workflows. The files can be anything such as SQL scripts, Python/Ruby/Shell scripts, configuration files, etc. This set of the workflow definitions is called project. ެࣜυΩϡϝϯτ(http://docs.digdag.io/)ΑΓҾ༻ ϚΠΫϩΞυͰݱࡏ60ݸͷϓϩδΣΫτ͕ಈ͍͍ͯΔ
ϓϩδΣΫτͷґଘؔ schedule: daily>: 12:00:00 +task1: _parallel: true +subtask1: call>: subtask1.dig
+subtask2: call>: subtask2.dig +task2: echo>: task finished successfully •callΦϖϨʔλΛ͏͜ͱͰdigϑΝΠϧ ͷׂΛߦ͏͜ͱ͕Մೳ •requireΛ͏ͱ͏গ͠ෳࡶͳDAGͷ දݱՄೳ subtask1 subtask2 task2
ϓϩδΣΫτؒͷґଘؔ ϓϩδΣΫτA ϓϩδΣΫτB ଞͷϓϩδΣ Ϋτͷ݁ՌΛݟΔ ͜ͱग़དྷͳ͍
ϓϩδΣΫτؒͷґଘؔ +touch_task: s3_touch>: bucket/flag/fileX +wait_task: s3_wait>: bucket/flag/fileX ϓϩδΣΫτB ϓϩδΣΫτA fileX
ࣗ࡞ΦϖϨʔλ ࢀߟ:https://github.com/ tosametal/digdag-plugins
ͦͷଞ ϫʔΫϑϩʔશମΛႈʹ͢Δ • hiveΫΤϦinsert overwrite • distcpoverwrite deleteΦϓγϣϯΛࢦఆ ϦτϥΠΛઃఆ͢Δ •
exponential interval
·ͱΊ • ϓϩδΣΫτංେԽ͠ͳ͍Α͏ʹػೳͰׂ • ϓϩδΣΫτؒͷґଘs3_waitͰղܾ • Α͘͏ػೳϓϥάΠϯΛ࡞Ζ͏
None