Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DigdagでETL処理をする
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
tosametal
July 19, 2019
Technology
0
4.2k
DigdagでETL処理をする
データとML周辺エンジニアリングを考える会 #2
https://data-engineering.connpass.com/event/136756/
#data_ml_engineering
tosametal
July 19, 2019
Tweet
Share
More Decks by tosametal
See All by tosametal
マイクロアドのアドテクを支える技術
tosametal
1
220
Qiita Career Meetup for Server Side Engineers
tosametal
4
4.3k
Other Decks in Technology
See All in Technology
20260311 技術SWG活動報告(デジタルアイデンティティ人材育成推進WG Ph2 活動報告会)
oidfj
0
370
Claude Code のコード品質がばらつくので AI に品質保証させる仕組みを作った話 / A story about building a mechanism to have AI ensure quality, because the code quality from Claude Code was inconsistent
nrslib
13
8.6k
OpenClaw を Amazon Lightsail で動かす理由
uechishingo
0
200
Everything Claude Code を眺める
oikon48
12
7.9k
NewSQL_ ストレージ分離と分散合意を用いたスケーラブルアーキテクチャ
hacomono
PRO
4
400
Postman v12 で変わる API開発ワークフロー (Postman v12 アップデート) / New API development workflow with Postman v12
yokawasa
0
140
PMとしての意思決定とAI活用状況について
lycorptech_jp
PRO
0
140
複数クラスタ運用と検索の高度化:ビズリーチにおけるElastic活用事例 / ElasticON Tokyo2026
visional_engineering_and_design
0
170
"作る"から"使われる"へ:Backstage 活用の現在地
sbtechnight
0
190
【Λ(らむだ)】最近のアプデ情報 / RPALT20260318
lambda
0
100
Agent ServerはWeb Serverではない。ADKで考えるAgentOps
akiratameto
0
120
TypeScript 7.0の現在地と備え方
uhyo
7
1.8k
Featured
See All Featured
Ruling the World: When Life Gets Gamed
codingconduct
0
180
My Coaching Mixtape
mlcsv
0
78
Building Adaptive Systems
keathley
44
3k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
980
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
110
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
410
The agentic SEO stack - context over prompts
schlessera
0
700
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
74
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
55k
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
110k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
250
Transcript
DigdagͰETLॲཧΛ͢Δ σʔλͱMLपลΤϯδχΞϦϯάΛߟ͑Δձ #2 2019.07.19 தᠳଠ(@tosametal) גࣜձࣾϚΠΫϩΞυ ΞϓϦέʔγϣϯΤϯδχΞ
ϚΠΫϩΞυʹ͓͚Δػցֶश ࠂ৴γεςϜʹ͓͚ΔCTR༧ଌɺCVR༧ଌɺෆਖ਼ΫϦοΫͷݕग़ͳͲ
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫)
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫) at least once ϢχʔΫͳIDʹΑΔॏෳഉআ sessionͰཧ ႈͳॲཧ Kafka secondaryͰ kafkaΛࢦఆ jsonܗࣜͷ ߏԽσʔλ
Digdagͱ digϑΝΠϧʹએݴతʹϫʔΫϑϩʔΛهड़ Workflow as code εέδϡʔϧ࣮ߦɺϦΧόϦ UI͔Βਐḿͷ֬ೝ࠶࣮ߦ͕Մೳ ΦϖϨʔλΛࣗ࡞Մೳ
PostgreSQL ࣮ߦཤྺͳͲΛอଘ Task͝ͱʹhadoopΫϥΠΞϯτ ͱͳΔίϯςφΛ্ཱͪ͛Δ εέʔϧΞτՄೳ όον࣮ߦج൫ߏ
ෳࡶͳґଘؔΛ੍ޚͭͭ͠ ϫʔΫϑϩʔͷՄಡੑΛอͭ
ϓϩδΣΫτΛػೳ୯ҐͰׂ ϓϩδΣΫτͱ In Digdag, workflows are packaged together with other
files used in the workflows. The files can be anything such as SQL scripts, Python/Ruby/Shell scripts, configuration files, etc. This set of the workflow definitions is called project. ެࣜυΩϡϝϯτ(http://docs.digdag.io/)ΑΓҾ༻ ϚΠΫϩΞυͰݱࡏ60ݸͷϓϩδΣΫτ͕ಈ͍͍ͯΔ
ϓϩδΣΫτͷґଘؔ schedule: daily>: 12:00:00 +task1: _parallel: true +subtask1: call>: subtask1.dig
+subtask2: call>: subtask2.dig +task2: echo>: task finished successfully •callΦϖϨʔλΛ͏͜ͱͰdigϑΝΠϧ ͷׂΛߦ͏͜ͱ͕Մೳ •requireΛ͏ͱ͏গ͠ෳࡶͳDAGͷ දݱՄೳ subtask1 subtask2 task2
ϓϩδΣΫτؒͷґଘؔ ϓϩδΣΫτA ϓϩδΣΫτB ଞͷϓϩδΣ Ϋτͷ݁ՌΛݟΔ ͜ͱग़དྷͳ͍
ϓϩδΣΫτؒͷґଘؔ +touch_task: s3_touch>: bucket/flag/fileX +wait_task: s3_wait>: bucket/flag/fileX ϓϩδΣΫτB ϓϩδΣΫτA fileX
ࣗ࡞ΦϖϨʔλ ࢀߟ:https://github.com/ tosametal/digdag-plugins
ͦͷଞ ϫʔΫϑϩʔશମΛႈʹ͢Δ • hiveΫΤϦinsert overwrite • distcpoverwrite deleteΦϓγϣϯΛࢦఆ ϦτϥΠΛઃఆ͢Δ •
exponential interval
·ͱΊ • ϓϩδΣΫτංେԽ͠ͳ͍Α͏ʹػೳͰׂ • ϓϩδΣΫτؒͷґଘs3_waitͰղܾ • Α͘͏ػೳϓϥάΠϯΛ࡞Ζ͏
None