Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
なまけものオバケたち -PHP 8.4 に入った新機能の紹介-
tanakahisateru
1
140
Mermaid x AST x 生成AI = コードとドキュメントの完全同期への道
shibuyamizuho
1
500
Kaigi on Railsに初参加したら、その日にLT登壇が決定した件について
tama50505
0
130
CQRS+ES の力を使って効果を感じる / Feel the effects of using the power of CQRS+ES
seike460
PRO
0
220
Cloudflare MCP ServerでClaude Desktop からWeb APIを構築
kutakutat
1
620
Zoneless Testing
rainerhahnekamp
0
150
KubeCon + CloudNativeCon NA 2024 Overviewat Kubernetes Meetup Tokyo #68 / amsy810_k8sjp68
masayaaoyama
0
290
歴史と現在から考えるスケーラブルなソフトウェア開発のプラクティス
i10416
0
220
Amazon S3 NYJavaSIG 2024-12-12
sullis
0
130
非ブラウザランタイムとWeb標準 / Non-Browser Runtimes and Web Standards
petamoriken
0
410
20年もののレガシープロダクトに 0からPHPStanを入れるまで / phpcon2024
hirobe1999
0
950
開発者とQAの越境で自動テストが増える開発プロセスを実現する
92thunder
1
220
Featured
See All Featured
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Learning to Love Humans: Emotional Interface Design
aarron
274
40k
Large-scale JavaScript Application Architecture
addyosmani
510
110k
Adopting Sorbet at Scale
ufuk
74
9.1k
Why Our Code Smells
bkeepers
PRO
335
57k
How GitHub (no longer) Works
holman
312
140k
Designing on Purpose - Digital PM Summit 2013
jponch
116
7k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Scaling GitHub
holman
459
140k
Reflections from 52 weeks, 52 projects
jeffersonlam
348
20k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
540
The Invisible Side of Design
smashingmag
299
50k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you