Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
2
1.9k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
家族の思い出を形にする 〜 1秒動画の生成を支えるインフラアーキテクチャ
ojima_h
3
2.1k
Railsの限界を超えろ!「家族アルバム みてね」の画像・動画の大規模アップロードを支えるアーキテクチャの変遷
ojima_h
5
1k
Podのオートスケーリングに苦戦し続けている話
ojima_h
1
380
ディメンショナルモデリングのすすめ
ojima_h
8
4.9k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
7
5.8k
Other Decks in Technology
See All in Technology
What's new in Go 1.26?
ciarana
2
250
2026-02-24 月末 Tech Lunch Online #10 Cloud Runのデプロイの課題から考えるアプリとインフラの境界線
masasuzu
0
100
組織のSREを推進するためのPlatform EngineeringとEKS / Platform Engineering and EKS to drive SRE in your organization
chmikata
0
150
【SLO】"多様な期待値" と向き合ってみた
z63d
2
240
Introduction to Bill One Development Engineer
sansan33
PRO
0
370
AI時代のAPIファースト開発
nagix
2
670
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
1.1k
AI Agentにおける評価指標とAgent GPA
tsho
1
220
技術キャッチアップ効率化を実現する記事推薦システムの構築
yudai00
2
160
AWS CDK の目玉新機能「Mixins」とは / cdk-mixins
gotok365
2
290
Claude Codeと駆け抜ける 情報収集と実践録
sontixyou
2
1.2k
AIエージェントで変わる開発プロセス ― レビューボトルネックからの脱却
lycorptech_jp
PRO
2
780
Featured
See All Featured
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.1k
How People are Using Generative and Agentic AI to Supercharge Their Products, Projects, Services and Value Streams Today
helenjbeal
1
130
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
78
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
199
72k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.6k
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
95
The AI Search Optimization Roadmap by Aleyda Solis
aleyda
1
5.3k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
We Have a Design System, Now What?
morganepeng
55
8k
Test your architecture with Archunit
thirion
1
2.2k
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None