Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sam Bessalah
April 06, 2016
Technology
320
0
Share
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
380
High Performance RPC with Finagle
samklr
1
220
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
830
Datageeks_27-05.pdf
samklr
0
75
Big data and Machine learning APIs
samklr
4
290
Scalable Machine Learning
samklr
2
260
mesos.devoxx.2014
samklr
2
290
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
310
Other Decks in Technology
See All in Technology
20260415_生成AIを専属DSに_自動レポート作成_ハンズオン_交通事故データ
doradora09
PRO
0
110
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
3k
レビューしきれない?それは「全て人力でのレビュー」だからではないでしょうか
amixedcolor
0
280
ネットワーク運用を楽にするAWS DevOps Agent活用法!! / 20260421 Masaki Okuda
shift_evolve
PRO
2
190
KGDC_13_Amazon Q Developerで挑む! 13事例から見えたAX組織変革の最前線_公開情報
kikugawa
0
110
[OpsJAWS 40]リリースしたら終わり、じゃなかった。セキュリティ空白期間をAWS Security Agentで埋める
sh_fk2
3
210
Azure Lifecycle with Copilot CLI
torumakabe
3
990
EBS暗号化に失敗してEC2が動かなくなった話
hamaguchimmm
2
170
Azure PortalなどにみるWebアクセシビリティ
tomokusaba
0
370
サイボウズ 開発本部採用ピッチ / Cybozu Engineer Recruit
cybozuinsideout
PRO
10
78k
DevOpsDays Tokyo 2026 軽量な仕様書と新たなDORA AI ケイパビリティで実現する、動くソフトウェアを中心とした開発ライフサイクル / DevOpsDays Tokyo 2026
n11sh1
0
150
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
74k
Featured
See All Featured
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
790
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
310
Measuring & Analyzing Core Web Vitals
bluesmoon
9
810
Building Experiences: Design Systems, User Experience, and Full Site Editing
marktimemedia
0
480
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2.1k
The Curse of the Amulet
leimatthew05
1
11k
Efficient Content Optimization with Google Search Console & Apps Script
katarinadahlin
PRO
1
490
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
180
How GitHub (no longer) Works
holman
316
150k
Redefining SEO in the New Era of Traffic Generation
szymonslowik
1
270
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
27
3.4k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None