Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWS Glueでリプレースしてみた/gunosy-use-glue
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
aibou
December 25, 2017
Programming
1.1k
4
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
AWS Glueでリプレースしてみた/gunosy-use-glue
aibou
December 25, 2017
More Decks by aibou
See All by aibou
LegalForceの契約データを脅かすリスクの排除と 開発速度の向上をどうやって両立したか
aibou
0
7.4k
LegalForce社での全文検索インフラ活用事例
aibou
0
140
SRE Lounge #7 Gunosy版「SREミッション」策定
aibou
9
7k
その接続先情報はどこに
aibou
0
3.8k
gunosy-beer-2016-07-27
aibou
1
850
Other Decks in Programming
See All in Programming
Oxlintのカスタムルールの現況
syumai
6
1.1k
Even G2とAWSで推しのエージェントを召喚しよう!
har1101
1
120
AIで効率化できた業務・日常
ochtum
0
140
JJUG CCC 2026 Spring: JSpecify で実現する Kotlin フレンドリーな Java API 設計
ternbusty
1
180
「なぜそう決めたのか」を残し続ける仕組み ― Notion AI カスタムエージェント × Slack連携による設計判断の自動記録 - NIKKEI Tech Talk #47
niftycorp
PRO
0
210
セキュリティの専門家じゃなくてもできる。「セキュリティ意識」をアップデートして サプライチェーン攻撃への耐性を高めよう。
tk3fftk
5
890
その問い、本当に正しいですか?AI時代のエンジニアに必要な哲学と認知科学 / ai-philosophy-cognitive-science
minodriven
11
5.9k
Vue × Nuxt × Oxc どこまで使える?実運用の現在地
andpad
0
270
代数的データ型って何が嬉しいの? #frontend_phpcon_do
kajitack
8
3.7k
DynamoDBには集計系のクエリがないけどなんとかしたい
musan
1
180
Signal Forms: Details & Live Coding @enterJS 2026 in Mannheim
manfredsteyer
PRO
0
160
Semantic Version 単位で戦略を柔軟に変えて、パッケージアップデートを自動化する
daitasu
1
260
Featured
See All Featured
Building a Modern Day E-commerce SEO Strategy
aleyda
45
9.1k
Agile that works and the tools we love
rasmusluckow
331
21k
Navigating Team Friction
lara
192
16k
sira's awesome portfolio website redesign presentation
elsirapls
0
280
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
2k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
GitHub's CSS Performance
jonrohan
1033
470k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.7k
How GitHub (no longer) Works
holman
316
150k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
560
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
430
Transcript
AWS GlueͰ ϦϓϨʔεͯ͠Έͨ גࣜձࣾGunosy ։ൃຊ෦ ྄ี
͓·͑ͩΕ • @aibou • SREνʔϜͰܯۀͬͯ·͢ • ϏοάσʔλະܦݧͰ͢ • Ξϝϑτ؍ઓ͕͖Ͱ͢
ۀ༰ • άϊγʔɾࠂαʔόͷӡ༻ • ࣗಈԽɾলྗԽΛతʹʑ׆ಈ • ίʔυԽ(codenize.tools, terraform) • ʮ͜ΜʹͪΘʔ
ܯͷͷͰ͕͢ʔʯ • OpsWorks, Kinesis3ܑఋ • X-Ray͍͍ͨͳ͊ɾɾɾ
࣍ • GunosyͰGlueΛಋೖͨ͠ʢ͔͠ยखؒͰʣ • AthenaύʔςΟγϣϯ࡞όονΛGlueʹஔ • ࣗલETLόονΛGlueʹஔ • DynamoDBͷϑΥʔϚοτมΛGlueͰ࣮ݱ
ᶃAthenaύʔςΟγϣϯ࡞ όονΛGlueʹஔ
άϊγʔͷࠂϩάͷྲྀΕ ʢͬ͘͟Γʣ CBUDITFSWFS &.3 4QBSL "1*TFSWFST PS "UIFOB ৴ީิ 'JSFIPTF
AthenaͱRedshiftͷ͍͚ • Athena • ՄࢹԽ༻ • ԕ͍աڈظؒͷूܭͰར༻ • Redshift •
ϩδοΫ༻ • ۙͷσʔλ͔Β৴ީิͷੜ • ετϨʔδ༻ϑϧʹͳΔͱRedshiftೖΕସ͑ΔରԠ • ʹݹ͍σʔλΛࣺͯΔ(S3ʹόοΫΞοϓ͋Γ)
Athena • fluent-plugin-s3ͰϩάΛS3ʹΞοϓϩʔυ • tag͝ͱʹ(imp, click) • hive friendly (
/click/year=2017/... ) ͳkeyͰ • ఆ࣌όονͰADD PARTITION͢Δ • MSCK REPAIRͩͱ࣮͕͕͔͔֬ͩ࣌ؒΔ • MSCK REPAIR ≒ 2h, ADD PARTITION < 1s
Athena ύʔςΟγϣϯ࡞όον • ࣾͰԿނ͔ʮશࣗಈ͍͋΅͏ʯͱݺΕ͍ͯΔ • CloudWatch Events + Lambda •
LambdaJava • codenize-tools/monosasi ͰDSLԽ • s3://bucket/path/to/year=2017/month=11/day=24/hour=18/ • ཁ͕͋Είʔυެ։͠·͢ʢ͕ɺGlueͷํ͕ศརΑͶɾɾɾ • FirehoseϑΥʔϚοτʹରԠՄೳ
Glue DataCatalog • AthenaLambdaͰRate limit • ҙࣝͯͣ͠Β͞ͳ͍ͱ͕࣌ؒूத͢Δ • εΩʔϚཧͱCloudWatch Eventsͷཧਏ͍
• ݱঢ়45ςʔϒϧ͙Β͍ɻطʹ͠ΜͲ͍
to Glue Crawler & DataCatalog "1*TFSWFST "-5&35"#-& "%%1"35*5*0/
ॴײ • Lambda͕ෆཁʹͳͬͨʢͬͨͶʣ • 1ΫϩʔϥͰෳͷSource DataStoreʹରԠ • Ϋϩʔϥ͕ཚཱ͠ͳͯ͘ॿ͔Δ • ϩάྔ͕ଟ͍ͱΫϩʔϦϯάʹ͕͔͔࣌ؒΔ
• ͍·ͷͱ͜Ζ6͔͔࣌ؒͬͯΔ
GlueͰͰ͖ͳ͔ͬͨ͜ͱ • ҟϦʔδϣϯͷAthenaʹAdd table • ઌड़ͷύʔςΟγϣϯՃόονͰରԠ • طଘͷAthena Tableʹରͯ͠DataCatalogͷద༻ •
database/tableͷ࡞Γ͠ʢӨڹ͋ΔͷͰ·ͩʣ • Glueͷςʔϒϧ໊ͷࢦఆɾมߋ • ಛҟͳϑΥʔϚοτʹରԠෆՄ ཌ
ᶄࣗલETLόονΛGlueʹஔ
ϩάͱϚελʔͷJOIN • ϩάσʔλɿS3 ɹɹ Redshift • imp, click etc •
ϚελʔσʔλɿRDS • campaign, creative etc • ʮAthenaɾRedshiftͰJOIN͍ͨ͠ʯ • => Digdag + EmbulkͰରԠ͍ͯͨ͠
Embulk + digdag (+ docker) "-5&35"#-& "%%1"35*5*0/
to Glue Crawler & Glue ETL PSFHPOSFHJPO UPLZPSFHJPO SFQMJDB ᶃ
ᶄ ᶄͷNFUBEBUBͰ BEEUBCMF ᶃͷNFUBEBUBͰ &5-
Additional • statsςʔϒϧɿຖͷूܭࡁΈσʔλ • ςʔϒϧશߦͰͳ͘ຖͷσʔλ͚ͩUpload • ETLͷJobͰFilter transform classΛ༻
TUBUTUBCMF TQBUIUPZFBSNPOUIEBZ
Filter transformer class def filter_function(dynamic_record): if dynamic_record["date"].strftime("%Y-%m-%d") == yesterday.strftime("%Y-%m-%d"): return
True else: return False filtered0 = Filter.apply(frame=datasource0, f=filter_function, transformation_ctx="filtered0")
GlueͰΑ͔ͬͨͱ͜Ζ • ETL͕Glue͚ͩͰ݁͢Δ • Lambdaෆཁ • Digdag + Embulkෆཁ •
αʔό(ECS)ෆཁ • ਓ͕ؒςʔϒϧΛྻڍ͠ͳ͍͍ͯ͘ • ࠓ·ͰLambdaͱ͔EmbulkͷઃఆϑΝΠϧͰશ෦ࢦఆͯͨ͠
GlueͰΑ͘ͳ͔ͬͨͱ͜Ζ • CrawlerͰͳ͘ETLଆͰfilter͍ͯ͠Δ • ൃߦ͞ΕΔSQL SELECT * FROM hoge_stats; •
CrawlerଆͰWHEREઃఆ͍ͨ͠ • 18࣌ؒͰऴΘΒͳ͍Job (3ԯϨίʔυ) • JobͷΫϩʔϯ͕Ͱ͖ͳ͍ʢΘΓͱٸ͗Ͱ΄͍͠ʣ • ͍ͭ͘ࣅͨΑ͏ͳͷΛ࡞Δͷʹख͕ؒ
ᶅDynamoDBͷϑΥʔϚοτมΛ GlueͰ࣮ݱ (ϦϓϨʔεͰͳ͍͚Ͳ)
DynamoDBͷσʔλΛੳ͍ͨ͠ • άϊγʔͷλϒฒͼใ • Ͳ͏͍ͬͨϢʔβ͕Ͳ͏͍͏λϒΛϑΥϩʔͯ͠Δ͔ • ಛఆ݅ͰࣗಈՃ͞ΕΔλϒ • ਖ਼͘͠ػೳ͍ͯ͠Δ͔ •
ͲΕ͙Β͍͍Δͷ͔ • ͝ͱʹूܭ͍ͨ͠
DynamoDBͷFull dumpͱFormat • DataPipelineͰFull dump • DynamoDB Stream ࠓޙΔʢئʣ •
σʔλهड़ࢠ -> JSON • ͦͷ··ਏ͍ • (ͳΜͱ͔ͯ͠) convert͢Δ { "Item": { "Age": {"N": "8"}, "Colors": { "L": [ {"S": "White"}, {"S": "Brown"}, {"S": "Black"} ] }, "Name": {"S": "Fido"}, "Vaccinations": { "M": { "Rabies": { "L": [ {"S": "2009-03-17"}, {"S": "2011-09-21"}, {"S": "2014-07-08"} ] }, "Distemper": {"S": "2015-10-13"} } }, "Breed": {"S": "Beagle"}, "AnimalType": {"S": "Dog"} }
Data Pipeline + Glue ETL Amazon DynamoDB %BUB1JQFMJOF w .BQBQQMZ
w UP%'GSPN%'
ETLͷ࣮ݱʹ͋ͨͬͯ • Built-in Transformer class͚ͩͰͳΜͱͳΒͳ͍͕࣌͋Δ • ΑΓෳࡶͳFilterɾMap࣌ • toDF &
fromDFΛ͏ʢී௨ͷpysparkʣ • job bookmarkΛ͏࣌ try & except ඞਢ • toDFͰίέΔʢύʔςΟγϣϯʹ·ͭΘΔͬΆ͍ • job bookmarkͷϢʔεέʔε͕·ͩΘ͔ͬͯͳ͍
ϋϚΓϙΠϯτ • Built-in TransformerΫϥεͷίʔϧόοΫؔͰൃੜͨ͠ExceptionѲΓͭͿ͞ΕΔ • Exception͕ൃੜͨ͠dynamic recordഁغ͞Εɺ࣍ͷϨίʔυॲཧ͕ҠΔ • σόοά͢ΔͳΒίʔϧόοΫؔશମΛtry-except •
RDSΛSource DataStoreʹ͢Δ߹Subnetʹҙ • ͍ΘΏΔʮLambda in VPCʯ • ඞͣNAT Gateway͔VPC EndpointͰS3ʹΞΫηεͰ͖ΔΑ͏ʹ͓ͯ͘͠(Scriptͷ DL) • NWઃఆөʹԿނ͔30͙Β͍͕͔͔࣌ؒΔ
Glueͷॴײ • ࣗલETLόονશ෦ΛGlueʹஔ͖͍͑ͨ • DataCatalogͱ͔࠷ߴ • ͕ɺݱঢ়ͩͱ৭ʑਏ͍ • ࣮ߦ࣌ؒɾಥવࢮɾ։ൃࠔ
Glueʹର͢Δཁ • ԿΑΓߴԽ • DynamoDBΛSource DataStoreͷରʹ • JobεΫϦϓτͷόʔδϣϯཧ • JobͷσόοάΛ༰қʹ(Τϥʔ͕ѲΓͭͿ͞ΕΔ)
• Built-in Transform Classʹ͍ͭͯυΩϡϝϯτͷॆ࣮ • Crawler & ETL jobͷࢹ΄͍͠ • ScalaͰॻ͔ͤͯ͘Ε • EMR(Spark)ͷࢿ࢈Λྲྀ༻͍ͨ͠