Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Intro to Parquet (June 2015)
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Sam Bessalah
April 06, 2016
Technology
0
320
Intro to Parquet (June 2015)
Sam Bessalah
April 06, 2016
Tweet
Share
More Decks by Sam Bessalah
See All by Sam Bessalah
Streaming Platforms
samklr
0
370
High Performance RPC with Finagle
samklr
1
220
Dotscale 2015 Lightning - Distributed Systems Research
samklr
1
820
Datageeks_27-05.pdf
samklr
0
73
Big data and Machine learning APIs
samklr
4
280
Scalable Machine Learning
samklr
2
260
mesos.devoxx.2014
samklr
2
290
Algebird : Abstract Algebra for Big Data Analytics.
samklr
9
3k
Algebra for analytics
samklr
1
300
Other Decks in Technology
See All in Technology
AIにより大幅に強化された AWS Transform Customを触ってみる
0air
0
100
Navigation APIと見るSvelteKitのWeb標準志向
yamanoku
2
130
「活動」は激変する。「ベース」は変わらない ~ 4つの軸で捉える_AI時代ソフトウェア開発マネジメント
sentokun
0
120
Datadog で実現するセキュリティ対策 ~オブザーバビリティとセキュリティを 一緒にやると何がいいのか~
a2ush
0
170
FastMCP OAuth Proxy with Cognito
hironobuiga
3
220
Sansanの認証基盤を支えるアーキテクチャとその振り返り
sansantech
PRO
1
110
MIX AUDIO EN BROADCAST
ralpherick
0
120
AIエージェント勉強会第3回 エージェンティックAIの時代がやってきた
ymiya55
0
150
Astro Islandsの 内部実装を 「日本で一番わかりやすく」 ざっくり解説!
knj
0
310
Oracle Cloud Infrastructure(OCI):Onboarding Session(はじめてのOCI/Oracle Supportご利⽤ガイド)
oracle4engineer
PRO
2
17k
Oracle AI Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
5
1.2k
パワポ作るマンをMCP Apps化してみた
iwamot
PRO
0
210
Featured
See All Featured
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
230
Stop Working from a Prison Cell
hatefulcrawdad
274
21k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
1
160
Design in an AI World
tapps
0
180
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.9k
ラッコキーワード サービス紹介資料
rakko
1
2.8M
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
160
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
500
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
310
Odyssey Design
rkendrick25
PRO
2
560
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.1k
Transcript
Sam BESSALAH @samklr http://parquet.apache.org
Typical Data workflow
Typical Data workflow
Typical Data workflow
Typical Data workflow
Multiple Data Format
Big Data Data Format Zoo - Sequence Files
these formats provide
None
Binary, columnar storage format for big data analytics workloads, inspired
by the Google Dremel Paper. - Language independent - Processing framework independent - Formally specified - More than a columnar storage : Dynamic partionning, automatic predicate and projections push down - Awesome performance
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101
Columnar Storage 101 Advantages : - Limits I/O to the
data only needed - Big Space savings, better compression, and faster and low overhead encodings - Enables vectorized engine
Columnar Storage 101
None
Parquet Model
Example Parquet Schema
None
None
Definition and Repetition Levels Definition Level : Stores the level
for which the field is null Repetition Level : Store levels when new lists are starting in column values.
None
None
None
None
None
None
Numbers Example: Appnexus 2 MM Logs of Ads impressions 270
TB of Log Data in Protobuf on HDFS http://techblog.appnexus.com/blog/2015/03/31/parquet-columnar-storage-for-hadoop-data/
simple bench with HIVE
None
None
Disk Space usage on HDFS with 128 MB blocks
None
None
None
None
None
None
Slides shamelessly cloned from Julien Le Dem(@J_) , Lead of
the Apache Parquet Project
BACKUP SLIDES
None
None
None
None
None
None
None
None
None
None
None
None