Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
負債になりにくいCSSをデザイナとつくるには?
fsubal
10
2.4k
Unity Android XR入門
sakutama_11
0
160
Ruby on cygwin 2025-02
fd0
0
150
How mixi2 Uses TiDB for SNS Scalability and Performance
kanmo
38
14k
Conform を推す - Advocating for Conform
mizoguchicoji
3
690
Formの複雑さに立ち向かう
bmthd
1
860
一休.com のログイン体験を支える技術 〜Web Components x Vue.js 活用事例と最適化について〜
atsumim
0
530
GAEログのコスト削減
mot_techtalk
0
120
ML.NETで始める機械学習
ymd65536
0
110
Djangoアプリケーション 運用のリアル 〜問題発生から可視化、最適化への道〜 #pyconshizu
kashewnuts
1
250
苦しいTiDBへの移行を乗り越えて快適な運用を目指す
leveragestech
0
650
1年目の私に伝えたい!テストコードを怖がらなくなるためのヒント/Tips for not being afraid of test code
push_gawa
0
210
Featured
See All Featured
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
Building Applications with DynamoDB
mza
93
6.2k
GraphQLとの向き合い方2022年版
quramy
44
13k
Imperfection Machines: The Place of Print at Facebook
scottboms
267
13k
It's Worth the Effort
3n
184
28k
Embracing the Ebb and Flow
colly
84
4.6k
Building Adaptive Systems
keathley
40
2.4k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
29
1k
BBQ
matthewcrist
87
9.5k
Building a Scalable Design System with Sketch
lauravandoore
461
33k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
193
16k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you