Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
DomainException と Result 型で作る型安全なエラーハンドリング
karszawa
0
890
Memory API : Patterns, Performance et Cas d'Utilisation
josepaumard
0
110
Deoptimization: How YJIT Speeds Up Ruby by Slowing Down / RubyKaigi 2025
k0kubun
0
500
PHPで書いたAPIをGoに書き換えてみた 〜パフォーマンス改善の可能性を探る実験レポート〜
koguuum
0
140
AWS で実現する安全な AI エージェントの作り方 〜 Bedrock Engineer の実装例を添えて 〜 / how-to-build-secure-ai-agents
gawa
8
710
Code smarter, not harder - How AI Coding Tools Boost Your Productivity | Webinar 2025
danielsogl
0
120
マルチアカウント環境での、そこまでがんばらない RI/SP 運用設計
wa6sn
0
710
Coding Experience Cpp vs Csharp - meetup app osaka@9
harukasao
0
730
AI Agents with JavaScript
slobodan
0
220
php-fpm がリクエスト処理する仕組みを追う / Tracing-How-php-fpm-Handles-Requests
shin1x1
5
2.9k
タイムゾーンの奥地は思ったよりも闇深いかもしれない
suguruooki
1
570
Do Dumb Things
mitsuhiko
0
430
Featured
See All Featured
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
13
660
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2k
How GitHub (no longer) Works
holman
314
140k
How STYLIGHT went responsive
nonsquared
99
5.5k
Code Review Best Practice
trishagee
67
18k
Scaling GitHub
holman
459
140k
The World Runs on Bad Software
bkeepers
PRO
67
11k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
47
2.5k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
29
9.4k
GraphQLとの向き合い方2022年版
quramy
46
14k
The Cult of Friendly URLs
andyhume
78
6.3k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you