$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.3k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
スタートアップを支える技術戦略と組織づくり
pospome
8
13k
flutter_kaigi_2025.pdf
kyoheig3
2
380
『実践MLOps』から学ぶ DevOps for ML
nsakki55
2
490
CSC305 Lecture 15
javiergs
PRO
0
210
[SF Ruby Conf 2025] Rails X
palkan
0
380
Promise.tryで実現する新しいエラーハンドリング New error handling with Promise try
bicstone
3
1.7k
TypeScript 5.9 で使えるようになった import defer でパフォーマンス最適化を実現する
bicstone
1
540
全員アーキテクトで挑む、 巨大で高密度なドメインの紐解き方
agatan
8
11k
MAP, Jigsaw, Code Golf 振り返り会 by 関東Kaggler会|Jigsaw 15th Solution
hasibirok0
0
150
dnx で実行できるコマンド、作ってみました
tomohisa
0
110
最新のDirectX12で使えるレイトレ周りの機能追加について
projectasura
0
310
競馬で学ぶ機械学習の基本と実践 / Machine Learning with Horse Racing
shoheimitani
14
14k
Featured
See All Featured
The World Runs on Bad Software
bkeepers
PRO
72
12k
Visualization
eitanlees
150
16k
The Cost Of JavaScript in 2023
addyosmani
55
9.3k
VelocityConf: Rendering Performance Case Studies
addyosmani
333
24k
Imperfection Machines: The Place of Print at Facebook
scottboms
269
13k
jQuery: Nuts, Bolts and Bling
dougneiner
65
8k
Being A Developer After 40
akosma
91
590k
Faster Mobile Websites
deanohume
310
31k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4.1k
Code Review Best Practice
trishagee
72
19k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you