Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DigdagでETL処理をする
Search
tosametal
July 19, 2019
Technology
0
4.1k
DigdagでETL処理をする
データとML周辺エンジニアリングを考える会 #2
https://data-engineering.connpass.com/event/136756/
#data_ml_engineering
tosametal
July 19, 2019
Tweet
Share
More Decks by tosametal
See All by tosametal
マイクロアドのアドテクを支える技術
tosametal
1
150
Qiita Career Meetup for Server Side Engineers
tosametal
4
4.1k
Other Decks in Technology
See All in Technology
Windows 11 で AWS Documentation MCP Server 接続実践/practical-aws-documentation-mcp-server-connection-on-windows-11
emiki
0
720
TechLION vol.41~MySQLユーザ会のほうから来ました / techlion41_mysql
sakaik
0
150
ひとり情シスなCTOがLLMと始めるオペレーション最適化 / CTO's LLM-Powered Ops
yamitzky
0
380
白金鉱業Meetup_Vol.19_PoCはデモで語れ!顧客の本音とインサイトを引き出すソリューション構築
brainpadpr
2
490
LinkX_GitHubを基点にした_AI時代のプロジェクトマネジメント.pdf
iotcomjpadmin
0
160
VCpp Link and Library - C++ breaktime 2025 Summer
harukasao
0
220
新卒3年目の後悔〜機械学習モデルジョブの運用を頑張った話〜
kameitomohiro
0
390
IIWレポートからみるID業界で話題のMCP
fujie
0
730
BigQuery Remote FunctionでLooker Studioをインタラクティブ化
cuebic9bic
2
230
OpenHands🤲にContributeしてみた
kotauchisunsun
0
180
登壇ネタの見つけ方 / How to find talk topics
pinkumohikan
3
290
第9回情シス転職ミートアップ_テックタッチ株式会社
forester3003
0
150
Featured
See All Featured
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
Designing for Performance
lara
609
69k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
32
5.9k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
48
5.4k
How to Think Like a Performance Engineer
csswizardry
24
1.7k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.8k
Six Lessons from altMBA
skipperchong
28
3.8k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
920
Transcript
DigdagͰETLॲཧΛ͢Δ σʔλͱMLपลΤϯδχΞϦϯάΛߟ͑Δձ #2 2019.07.19 தᠳଠ(@tosametal) גࣜձࣾϚΠΫϩΞυ ΞϓϦέʔγϣϯΤϯδχΞ
ϚΠΫϩΞυʹ͓͚Δػցֶश ࠂ৴γεςϜʹ͓͚ΔCTR༧ଌɺCVR༧ଌɺෆਖ਼ΫϦοΫͷݕग़ͳͲ
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫)
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫) at least once ϢχʔΫͳIDʹΑΔॏෳഉআ sessionͰཧ ႈͳॲཧ Kafka secondaryͰ kafkaΛࢦఆ jsonܗࣜͷ ߏԽσʔλ
Digdagͱ digϑΝΠϧʹએݴతʹϫʔΫϑϩʔΛهड़ Workflow as code εέδϡʔϧ࣮ߦɺϦΧόϦ UI͔Βਐḿͷ֬ೝ࠶࣮ߦ͕Մೳ ΦϖϨʔλΛࣗ࡞Մೳ
PostgreSQL ࣮ߦཤྺͳͲΛอଘ Task͝ͱʹhadoopΫϥΠΞϯτ ͱͳΔίϯςφΛ্ཱͪ͛Δ εέʔϧΞτՄೳ όον࣮ߦج൫ߏ
ෳࡶͳґଘؔΛ੍ޚͭͭ͠ ϫʔΫϑϩʔͷՄಡੑΛอͭ
ϓϩδΣΫτΛػೳ୯ҐͰׂ ϓϩδΣΫτͱ In Digdag, workflows are packaged together with other
files used in the workflows. The files can be anything such as SQL scripts, Python/Ruby/Shell scripts, configuration files, etc. This set of the workflow definitions is called project. ެࣜυΩϡϝϯτ(http://docs.digdag.io/)ΑΓҾ༻ ϚΠΫϩΞυͰݱࡏ60ݸͷϓϩδΣΫτ͕ಈ͍͍ͯΔ
ϓϩδΣΫτͷґଘؔ schedule: daily>: 12:00:00 +task1: _parallel: true +subtask1: call>: subtask1.dig
+subtask2: call>: subtask2.dig +task2: echo>: task finished successfully •callΦϖϨʔλΛ͏͜ͱͰdigϑΝΠϧ ͷׂΛߦ͏͜ͱ͕Մೳ •requireΛ͏ͱ͏গ͠ෳࡶͳDAGͷ දݱՄೳ subtask1 subtask2 task2
ϓϩδΣΫτؒͷґଘؔ ϓϩδΣΫτA ϓϩδΣΫτB ଞͷϓϩδΣ Ϋτͷ݁ՌΛݟΔ ͜ͱग़དྷͳ͍
ϓϩδΣΫτؒͷґଘؔ +touch_task: s3_touch>: bucket/flag/fileX +wait_task: s3_wait>: bucket/flag/fileX ϓϩδΣΫτB ϓϩδΣΫτA fileX
ࣗ࡞ΦϖϨʔλ ࢀߟ:https://github.com/ tosametal/digdag-plugins
ͦͷଞ ϫʔΫϑϩʔશମΛႈʹ͢Δ • hiveΫΤϦinsert overwrite • distcpoverwrite deleteΦϓγϣϯΛࢦఆ ϦτϥΠΛઃఆ͢Δ •
exponential interval
·ͱΊ • ϓϩδΣΫτංେԽ͠ͳ͍Α͏ʹػೳͰׂ • ϓϩδΣΫτؒͷґଘs3_waitͰղܾ • Α͘͏ػೳϓϥάΠϯΛ࡞Ζ͏
None