Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
0
250
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
330
High Performance RPC with Finagle
samklr
1
160
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
770
Datageeks_27-05.pdf
samklr
0
47
Big data and Machine learning APIs
samklr
4
240
Scalable Machine Learning
samklr
2
210
mesos.devoxx.2014
samklr
2
230
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.7k
Algebra for analytics
samklr
1
270
Other Decks in Technology
See All in Technology
Technical Writing Meetup vol.35
soracom
PRO
2
130
DroidKaigi 2024 たすけて!ViewModel
mhidaka
5
1.1k
20240911_New_Relicダッシュボード活用例
speakerdeckfk
0
110
Fediverse Discovery Providers overview
andypiper
0
170
o1のAPIで実験してみたが 制限きつすぎて辛かった話
pharma_x_tech
0
240
ついに出た!OpenAIの最新モデル「o1」って何がすごいの?
minorun365
PRO
3
1.3k
『GRANBLUE FANTASY: Relink』最高の「没入感」を実現するカットシーン制作手法とそれを支える技術
cygames
1
160
Google CloudのLLM活用の選択肢を広げるVertex AIのパートナーモデル
nayuts
0
130
学術機関におけるID連携とOpenID Connect
fujie
0
330
Agile in Automotive Industry, puzzles and lights.
hiranabe
3
1.5k
Segment Anything Model 2
tenten0727
3
720
Discovering AI Models
picardparis
4
3.9k
Featured
See All Featured
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
166
48k
Into the Great Unknown - MozCon
thekraken
29
1.4k
Principles of Awesome APIs and How to Build Them.
keavy
125
16k
Automating Front-end Workflow
addyosmani
1365
200k
Why Our Code Smells
bkeepers
PRO
334
56k
Clear Off the Table
cherdarchuk
91
320k
Learning to Love Humans: Emotional Interface Design
aarron
270
40k
Faster Mobile Websites
deanohume
304
30k
Fantastic passwords and where to find them - at NoRuKo
philnash
48
2.8k
A designer walks into a library…
pauljervisheath
201
24k
It's Worth the Effort
3n
182
27k
The Language of Interfaces
destraynor
153
23k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None