Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MapReduce and Columnar DB's
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
samant
April 02, 2014
Programming
1.5k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
MapReduce and Columnar DB's
samant
April 02, 2014
More Decks by samant
See All by samant
Introduction to Firebase (May contain some pieces of AppGyver and Polymer)
samant
0
1.3k
Why Ruby on Rails? - Feweb - September 2014
samant
1
1.4k
Beloved JS - JavaScript…. What else?
samant
1
1.7k
WTF: OOCSS like a boss !
samant
7
1.5k
WTF: Document DBs
samant
0
2k
WTF: Rails App Templates
samant
2
2.8k
Other Decks in Programming
See All in Programming
依存関係から依存物へ―Dependencyという言葉の歴史をひも解く
j_lee
0
130
1B+ /day規模のログを管理する技術
broadleaf
0
110
才能?センス?知らん、 続けたもん勝ちだ。-- 結婚・出産・癌を越えてなお、私がプロダクトを創り続ける理由
16bitidol
1
200
技術記事、 専門家としてのプログラマ、 言語化
mizchi
13
6.5k
Vite+ Unified Toolchain for the Web
naokihaba
0
340
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
7.8k
Observability in Practice:Grafana 與 Edge Device SRE 的那些事
blueswen
0
170
Datadog × OpenTelemetry 入門と実践のあいだ
kn_to_maxpno
1
180
RTSPクライアントを自作してみた話
simotin13
0
630
そのテスト、説明できますか?~LWテスト戦略FW~のご紹介
nakahara
0
160
Performance Engineering for Everyone
elenatanasoiu
0
210
はてなアカウント基盤 State of the Union
cockscomb
0
630
Featured
See All Featured
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
Designing Powerful Visuals for Engaging Learning
tmiket
1
420
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
390
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
10k
Writing Fast Ruby
sferik
630
63k
How to Think Like a Performance Engineer
csswizardry
28
2.7k
Designing for humans not robots
tammielis
254
26k
Automating Front-end Workflow
addyosmani
1370
210k
We Have a Design System, Now What?
morganepeng
55
8.2k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
2k
Building AI with AI
inesmontani
PRO
1
1.1k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2.1k
Transcript
MapReduce and Columnar DB’s Amant Stéphane @stephamant
Summary • MapReduce • Columnar DB’s • Practical Use Case
MapReduce
MapReduce - Definition • One of Google’s greatest contributions to
computer science • MapReduce is an algorithmic framework for executing jobs in parallel over several nodes
MapReduce
MapReduce
MapReduce - Major Implementation • Almost always based on Hadoop
- a Framework for the storage and processing of large scaled and distributed data supported by Apache • Itself inspired by Google BigTable Project
Columnar DB’s
Columnar DB’s - Definition Columnar databases are so named because
the important aspect of their design is that data from a given column is stored together. (By contrast, a row-oriented database keeps information about a row together.) In column-oriented databases, adding columns is quite inexpensive.
Columnar DB’s - Definition
Columnar DB’s - Definition
Columnar DB’s - Queries get ‘t1′, ‘r1′, {COLUMN => ‘c1′}
get ‘t1′, ‘r1′, {COLUMN => ['c1', 'c2', 'c3']} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMERANGE => [ts1, ts2], VERSIONS => 4} get ‘t1′, ‘r1′, {COLUMN => ‘c1′, TIMESTAMP => ts1, VERSIONS => 4}
Columnar DB’s - Major Implementation • Cassandra • Hypertable •
HBase
Columnar DBs - Supporting Companies • Facebook • Yahoo •
Ebay • Twitter • Amazon • Google • ...
Columnar DB’s - Pro’s • Horizontal scalability (replication and partitioning)
• Versioning is trivial • No real storage cost for null values • Used mainly for Big Data / data mining / Business Intelligence analysis
Columnar DB’s - Con’s • Complexity (Installation, infrastructure and usage)
• Design your schema based on how you plan to query the data • Some operations are really time expensive
Practical Use Case
Facebook Messaging Index Table Keyword #1 Keyword #2 Keyword #3
Keyword #... User ID #1 User ID #2 User ID #... Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id Timestamp Message_id
References Seven Databases in Seven Weeks: A Guide to Modern
Databases and the NoSQL Movement by Eric Redmond and Jim R. Wilson
Thank you