Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
0
310
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
370
High Performance RPC with Finagle
samklr
1
210
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
820
Datageeks_27-05.pdf
samklr
0
69
Big data and Machine learning APIs
samklr
4
280
Scalable Machine Learning
samklr
2
260
mesos.devoxx.2014
samklr
2
280
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
Security Diaries of an Open Source IAM
ahus1
0
210
オレ達はAWS管理をやりたいんじゃない!開発の生産性を爆アゲしたいんだ!!
wkm2
4
450
元エンジニアPdM、IDEが恋しすぎてCursorに全業務を集約したら、スライド作成まで爆速になった話
doiko123
1
510
非情報系研究者へ送る Transformer入門
rishiyama
8
6.2k
Oracle Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
3
1.7k
DX Improvement at Scale
ntk1000
3
440
Claude Cowork Plugins を読む - Skills駆動型業務エージェント設計の実像と構造
knishioka
0
310
管理者向けGitHub Enterpriseの運用Tips紹介: 人にもAIにも優しいプラットフォームづくり
yuriemori
0
180
GitLab Duo Agent Platform + Local LLMサービングで幸せになりたい
jyoshise
0
200
PMBOK第8版は第7版から何が変わったのか(PMBOK第8版概要解説) / 20260304 Takeshi Watarai
shift_evolve
PRO
0
100
EMからICへ、二周目人材としてAI全振りのプロダクト開発で見つけた武器
yug1224
5
500
Fundraising Gala’s in 2026 What’s Changing & What Still Works
auctria
PRO
0
100
Featured
See All Featured
A better future with KSS
kneath
240
18k
What's in a price? How to price your products and services
michaelherold
247
13k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
How to Get Subject Matter Experts Bought In and Actively Contributing to SEO & PR Initiatives.
livdayseo
0
81
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.6k
Making Projects Easy
brettharned
120
6.6k
Darren the Foodie - Storyboard
khoart
PRO
3
2.8k
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
118
110k
The Power of CSS Pseudo Elements
geoffreycrofte
82
6.2k
Technical Leadership for Architectural Decision Making
baasie
3
280
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None