Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
データ分析基盤の変遷とデータレイクの作り方
Search
Ojima Hikaru
April 21, 2018
Technology
1
1.7k
データ分析基盤の変遷とデータレイクの作り方
Battle Conference U30 #2018
Ojima Hikaru
April 21, 2018
Tweet
Share
More Decks by Ojima Hikaru
See All by Ojima Hikaru
ディメンショナルモデリングのすすめ
ojima_h
7
4.3k
モンスターストライクを支えるデータ分析基盤と準リアルタイム集計
ojima_h
6
5.5k
Other Decks in Technology
See All in Technology
DuckDB雑紹介(1.1対応版)@DuckDB座談会
ktz
6
1.4k
OR学会2024秋_短期収益と将来のオフ方策評価性能を考慮したクーポン割当方策混合比の決定
recruitengineers
PRO
4
460
Jetpack Compose Modifier 徹底解説 / Jetpack Compose Modifier
wiroha
0
200
どこよりも遅めなWinActor Ver.7.5.0 新機能紹介
tamai_63
0
210
JEP 480: Structured Concurrency
aya_ebata
0
130
サーバー管理しないサーバーサービスManaged DevOps Pool
kkamegawa
0
130
JTCや セキュリティチェックリストが夢の跡
nikinusu
0
620
より快適なエラーログ監視を目指して
leveragestech
4
1.4k
Oracle Autonomous Database:サービス概要のご紹介
oracle4engineer
PRO
1
7.1k
PDF Viewer作成の今までとこれから
hunachi
0
470
アプリをリリースできる状態に保ったまま 段階的にリファクタリングするための 戦略と戦術 / Strategies and tactics for incremental refactoring
yanzm
6
1.4k
Road to Single Activity
yurihondo
1
240
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
89
16k
In The Pink: A Labor of Love
frogandcode
139
22k
Mobile First: as difficult as doing things right
swwweet
221
8.8k
A better future with KSS
kneath
235
17k
Code Review Best Practice
trishagee
62
16k
The Cost Of JavaScript in 2023
addyosmani
42
5.7k
GraphQLとの向き合い方2022年版
quramy
43
13k
How to train your dragon (web standard)
notwaldorf
85
5.6k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
166
48k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
8.9k
Large-scale JavaScript Application Architecture
addyosmani
508
110k
Embracing the Ebb and Flow
colly
83
4.4k
Transcript
L FG A
• S')1 0(6T • L>A9 XFLAG CDB=
!?NRK • GRD /%Q$7 • GRDO:>3GRD;<8H;C-,/ ACFM • P?/5#2(4&"Q 1+/GRDJPR • BIERN/ • @RIC. *6 / • GitHub: ojima-h 2
4 DAUKPI !
5
6 • • 2TB/day
30 → 1000
7 • 5
→ 100
− 8 S3
− 9 S3
− 10 Redshift
− 11
12 Data Lake Architecture
Data Lake " • -4,&$#!-4,+.' • -4,&% "%,(13*+)40&% !
(Schema on Read) • Data Lake -4,& DWH 24/$ $% 13
Data Lake 14 Hive Metastore
Hive Metastore 15
Hive " • Hadoop%(47-:.69!; • SQL ,*7&$S3 # HDFS !1:/
#1:/ & • ORC !3')83+:502& 16
Hive Metastore • S3/HDFS * "-SQL /1,&(.&0 (.&%)! •
,&(.& • * "- • * "-*#.+') • (.&%$.+ • 17
Hive Metastore • EMR ! Hive Metastore
! • • EMR 30 18
Hive Metastore • Hive Metastore MySQL
• Hive Metastore (HCatalog) server • EMR 5 19
Hive Metastore S3 20
Hive Metastore • ' • '"%
• 'ORC • '!&' ' !'#$$ 21
Hive Metastore • Hive Metastore S3 "
S3" !" 22
Hive Metastore * • "+$%- :>:>(*+ • 8C6*/,# •
3C;4' Hive DB / • Hive ).!% S3&*8C6/ • Hive &.( 8C6)-*@C@/ 23 3C;4 D=A49B<019?C2BBE 8C6579 8C6 Hive Database Table Partition S3 s3://BUCKET/warehouse/SERVICE.db/ s3://BUCKET/warehouse/SERVICE.db/TABLE/ s3://BUCKET/warehouse/SERVICE.db/TABLE/y=YYYY/m=MM/d=DD/
Hive Metastore • %)" &'&'%)" • &$#
! ( 24
Hive Metastore 1. Hive Metastore
25
Hive Metastore 1. Hive Metastore
2. 26
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 27
Hive Metastore 1. Hive Metastore
2. 3. Hive Metastore 4. 28
Hive Metastore ! 1. ),(! $ Hive Metastore # 2.
),($'*, 3. Hive Metastore ! $ 4. ),($ &%+ $ "),($ 29
Hive Metastore 30
Hive Metastore • Hive Redshift "%!$%# • Redshift
COPY "%! csv+gzip • Hive "%! ORC • Redshift csv+gzip Hive ORC ⇒ Redshift Spectrum 31
Redshift Spectrum • Redshift S3(#$+ &%*" • ',)+
Hive Metastore ! Hive ',)+" 32 CREATE EXTERNAL SCHEMA schema_name FROM HIVE METASTORE DATABASE 'database_name’ URI 'hive_metastore_uri’;
Hive Metastore • Redshift Hive 33 INSERT
INTO ‘Redshift ’ SELECT … FROM ‘Hive ’ WHERE y=YYYY AND m=MM AND d=DD;
Hive Metastore • Redshift Spectrum
Hive Metastore • Spark SQL • Presto • Athena • Flink 34
Hive Metastore Hive Metastore S3 Hive,
Redshift Spectrum , Spark 35
36
($) • Hive Metastore '25103-$251.4/4& • Hive Metastore , $"
Data Lake , !$# 251&*251&%+$#! Hive Metastore , +$# Data Lake , "$#(!6 37
None