Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
320
0
Share
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
380
High Performance RPC with Finagle
samklr
1
220
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
830
Datageeks_27-05.pdf
samklr
0
75
Big data and Machine learning APIs
samklr
4
290
Scalable Machine Learning
samklr
2
260
mesos.devoxx.2014
samklr
2
290
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
310
Other Decks in Technology
See All in Technology
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
74k
EarthCopilotに学ぶマルチエージェントオーケストレーション
nakasho
0
260
AWS Agent Registry の基礎・概要を理解する/aws-agent-registry-intro
ren8k
1
330
Digitization部 紹介資料
sansan33
PRO
1
7.3k
JEDAI in Osaka 2026イントロ
taka_aki
0
240
猫でもわかるKiro CLI(CDKコーディング編)
kentapapa
1
120
生成AI時代のエンジニア育成 変わる時代と変わらないコト
starfish719
0
8.9k
CloudSec JP #005 後締め ~ソフトウェアサプライチェーン攻撃から開発者のシークレットを守る~
lhazy
0
220
AIを共同作業者にして書籍を執筆する方法 / How to Write a Book with AI as a Co-Creator
ama_ch
2
120
Code Interpreter で、AIに安全に コードを書かせる。
yokomachi
0
6.9k
Eight Engineering Unit 紹介資料
sansan33
PRO
3
7.2k
弁護士ドットコム株式会社 エンジニア職向け 会社紹介資料
bengo4com
0
120
Featured
See All Featured
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
740
Technical Leadership for Architectural Decision Making
baasie
3
320
エンジニアに許された特別な時間の終わり
watany
106
240k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
260
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.3k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
250
1.3M
Art, The Web, and Tiny UX
lynnandtonic
304
21k
Claude Code のすすめ
schroneko
67
220k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
200
SEOcharity - Dark patterns in SEO and UX: How to avoid them and build a more ethical web
sarafernandez
0
170
How to train your dragon (web standard)
notwaldorf
97
6.6k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None