Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
samant
April 02, 2014
Programming
0
1.4k
MapReduce and Columnar DB's
samant
April 02, 2014
Tweet
Share
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.2k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.3k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
詳細解説! ArrayListの仕組みと実装
yujisoftware
0
410
Nuxt UI Pro、NuxtHub、Nuxt Scripts、Nuxtエコシステムをふんだんに利用して開発するコーポレートサイト@Vue Fes Japan 2024
shingangan
3
820
Re:proS_案内資料
rect
0
250
EventSourcingの理想と現実
wenas
5
1.9k
Going Staff - Keynote edition
pragtob
0
420
ピクシブ百科事典のWebフロントエンドパフォーマンス改善
higara
0
200
cXML という電子商取引の トランザクションを支える プロトコルと向きあっている話
phigasui
2
1.7k
CSC509 Lecture 07
javiergs
PRO
0
140
知られざるNaNの世界
hole
3
1.2k
CSC305 Lecture 13
javiergs
PRO
0
120
watsonx.ai Dojo #3 プロンプトエンジニアリング入門
oniak3ibm
PRO
0
480
開発効率向上のためのリファクタリングの一歩目の選択肢 ~コード分割~ / JJUG CCC 2024 Fall
ryounasso
0
300
Featured
See All Featured
Rails Girls Zürich Keynote
gr2m
93
13k
Side Projects
sachag
452
42k
Faster Mobile Websites
deanohume
304
30k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
46
2.1k
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
Agile that works and the tools we love
rasmusluckow
327
21k
Designing for humans not robots
tammielis
249
25k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
250
21k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
126
18k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
32
1.8k
Art, The Web, and Tiny UX
lynnandtonic
296
20k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
25
1.8k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you