Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
1
1.9k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
Podのオートスケーリングに苦戦し続けている話
ojima_h
1
320
ディメンショナルモデリングのすすめ
ojima_h
7
4.7k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
6
5.7k
Other Decks in Technology
See All in Technology
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
5
39k
Amplify Gen2から知るAWS CDK Toolkit Libraryの使い方/How to use the AWS CDK Toolkit Library as known from Amplify Gen2
fossamagna
1
350
united airlines ™®️ USA Contact Numbers: Complete 2025 Support Guide
flyunitedhelp
1
470
Figma Dev Mode MCP Serverを用いたUI開発
zoothezoo
0
230
[SRE NEXT 2025] すみずみまで暖かく照らすあなたの太陽でありたい
carnappopper
2
470
An introduction to Claude Code SDK
choplin
2
1k
「現場で活躍するAIエージェント」を実現するチームと開発プロセス
tkikuchi1002
3
300
安定した基盤システムのためのライブラリ選定
kakehashi
PRO
3
130
推し書籍📚 / Books and a QA Engineer
ak1210
0
140
クラウド開発の舞台裏とSRE文化の醸成 / SRE NEXT 2025 Lunch Session
kazeburo
1
580
CDK Toolkit Libraryにおけるテストの考え方
smt7174
1
550
スタックチャン家庭用アシスタントへの道
kanekoh
0
120
Featured
See All Featured
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Unsuck your backbone
ammeep
671
58k
The Cult of Friendly URLs
andyhume
79
6.5k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Statistics for Hackers
jakevdp
799
220k
Building Applications with DynamoDB
mza
95
6.5k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
22k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
340
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
830
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None