Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
0
300
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
360
High Performance RPC with Finagle
samklr
1
200
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
810
Datageeks_27-05.pdf
samklr
0
59
Big data and Machine learning APIs
samklr
4
270
Scalable Machine Learning
samklr
2
250
mesos.devoxx.2014
samklr
2
270
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
2.9k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
普段使ってるClaude Skillsの紹介(by Notebooklm)
zerebom
8
2.4k
Redshift認可、アップデートでどう変わった?
handy
1
110
AWSに革命を起こすかもしれない新サービス・アップデートについてのお話
yama3133
0
520
20251203_AIxIoTビジネス共創ラボ_第4回勉強会_BP山崎.pdf
iotcomjpadmin
0
140
さくらのクラウド開発ふりかえり2025
kazeburo
2
1.2k
AWS re:Invent 2025~初参加の成果と学び~
kubomasataka
1
200
株式会社ビザスク_AI__Engineering_Summit_Tokyo_2025_登壇資料.pdf
eikohashiba
1
120
Building Serverless AI Memory with Mastra × AWS
vvatanabe
1
680
Strands AgentsとNova 2 SonicでS2Sを実践してみた
yama3133
1
2k
AIBuildersDay_track_A_iidaxs
iidaxs
4
1.5k
Knowledge Work の AI Backend
kworkdev
PRO
0
290
LayerX QA Night#1
koyaman2
0
270
Featured
See All Featured
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Learning to Love Humans: Emotional Interface Design
aarron
274
41k
BBQ
matthewcrist
89
9.9k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
1
210
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
0
100
AI: The stuff that nobody shows you
jnunemaker
PRO
1
29
The Curious Case for Waylosing
cassininazir
0
200
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Faster Mobile Websites
deanohume
310
31k
First, design no harm
axbom
PRO
1
1.1k
The World Runs on Bad Software
bkeepers
PRO
72
12k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
0
100
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None