Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
Agent Rules as Domain Parser
yodakeisuke
1
350
What Spring Developers Should Know About Jakarta EE
ivargrimstad
1
620
TSConfig Solution Style & subpath imports to switch types on a per-file basis
maminami373
1
180
ワイがおすすめする新潟の食 / 20250530phpconf-niigata-eve
kasacchiful
0
260
型付け力を強化するための Hoogle のすゝめ / Boosting Your Type Mastery with Hoogle
guvalif
1
230
Language Server と喋ろう – TSKaigi 2025
pizzacat83
2
670
複雑なフォームを継続的に開発していくための技術選定・設計・実装 #tskaigi / #tskaigi2025
izumin5210
12
6.4k
少数精鋭エンジニアがフルスタック力を磨く理由 -そしてAI時代へ-
rebase_engineering
0
130
ユーザーにサブドメインの ECサイトを提供したい (あるいは) 2026年函館で一番熱くなるかもしれない言語の話
uvb_76
0
180
TypeScript製IaCツールのAWS CDKが様々な言語で実装できる理由 ~他言語変換の仕組み~ / cdk-language-transformation
gotok365
7
380
Zennの運営完全に理解した #完全に理解したTalk
wadayusuke
1
140
Doma で目指す ORM 最適解
nakamura_to
1
160
Featured
See All Featured
Build your cross-platform service in a week with App Engine
jlugia
231
18k
Rebuilding a faster, lazier Slack
samanthasiow
81
9k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
180
53k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Making the Leap to Tech Lead
cromwellryan
134
9.3k
Optimising Largest Contentful Paint
csswizardry
37
3.3k
How STYLIGHT went responsive
nonsquared
100
5.6k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
25
2.8k
Embracing the Ebb and Flow
colly
85
4.7k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
5
620
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
14
1.5k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you