Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
1
1.9k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
Podのオートスケーリングに苦戦し続けている話
ojima_h
1
310
ディメンショナルモデリングのすすめ
ojima_h
7
4.7k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
6
5.7k
Other Decks in Technology
See All in Technology
生成AIでwebアプリケーションを作ってみた
tajimon
2
120
キャディでのApache Iceberg, Trino採用事例 -Apache Iceberg and Trino Usecase in CADDi--
caddi_eng
0
170
AWS CDK 実践的アプローチ N選 / aws-cdk-practical-approaches
gotok365
4
480
Azure AI Foundryでマルチエージェントワークフロー
seosoft
0
150
原則から考える保守しやすいComposable関数設計
moriatsushi
3
500
エンジニア向け技術スタック情報
kauche
0
110
In Praise of "Normal" Engineers (LDX3)
charity
2
1.2k
OAuth/OpenID Connectで実現するMCPのセキュアなアクセス管理
kuralab
5
850
Amazon S3標準/ S3 Tables/S3 Express One Zoneを使ったログ分析
shigeruoda
2
380
[TechNight #90-1] 本当に使える?ZDMの新機能を実践検証してみた
oracle4engineer
PRO
3
140
成立するElixirの再束縛(再代入)可という選択
kubell_hr
0
890
(非公式) AWS Summit Japan と 海浜幕張 の歩き方 2025年版
coosuke
PRO
1
330
Featured
See All Featured
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
281
13k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
2.8k
A better future with KSS
kneath
239
17k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
107
19k
Docker and Python
trallard
44
3.4k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
50k
VelocityConf: Rendering Performance Case Studies
addyosmani
330
24k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
The Straight Up "How To Draw Better" Workshop
denniskardys
233
140k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
700
Designing for Performance
lara
609
69k
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None