Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MongoDB for Analytics
Search
John Nunemaker
PRO
May 04, 2012
Programming
21
2.2k
MongoDB for Analytics
Presented at MongoSF on May 4th, 2012.
John Nunemaker
PRO
May 04, 2012
Tweet
Share
More Decks by John Nunemaker
See All by John Nunemaker
Atom
jnunemaker
PRO
10
4.2k
MongoDB for Analytics
jnunemaker
PRO
11
890
Addicted to Stable
jnunemaker
PRO
32
2.5k
MongoDB for Analytics
jnunemaker
PRO
16
30k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.3k
Why NoSQL?
jnunemaker
PRO
10
920
Don't Repeat Yourself, Repeat Others
jnunemaker
PRO
7
3.3k
I Have No Talent
jnunemaker
PRO
14
950
Why MongoDB Is Awesome
jnunemaker
PRO
18
4.4k
Other Decks in Programming
See All in Programming
Qiita Bash
mercury_dev0517
1
190
サービスレベルを管理してアジャイルを加速しよう!! / slm-accelerate-agility
tomoyakitaura
1
170
Building a macOS screen saver with Kotlin (Android Makers 2025)
zsmb
1
140
Being an ethical software engineer
xgouchet
PRO
0
210
スモールスタートで始めるためのLambda×モノリス
akihisaikeda
2
180
Firebase Dynamic Linksの代替手段を自作する / Create your own Firebase Dynamic Links alternative
kubode
0
230
PHP で学ぶ OAuth 入門
azuki
1
130
AI Coding Agent Enablement - エージェントを自走させよう
yukukotani
13
5.8k
Java 24まとめ / Java 24 summary
kishida
3
470
Ruby's Line Breaks
yui_knk
2
480
新しいPHP拡張モジュールインストール方法「PHP Installer for Extensions (PIE)」を使ってみよう!
cocoeyes02
0
340
Devin入門と最近のアップデートから見るDevinの進化 / Introduction to Devin and the Evolution of Devin as Seen in Recent Update
rkaga
9
4.8k
Featured
See All Featured
VelocityConf: Rendering Performance Case Studies
addyosmani
328
24k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
9
740
Building an army of robots
kneath
304
45k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
Java REST API Framework Comparison - PWX 2021
mraible
30
8.5k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
23
2.6k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.2k
A Modern Web Designer's Workflow
chriscoyier
693
190k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
21k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Transcript
GitHub John Nunemaker MongoSF 2012 May 4, 2012 MongoDB for
Analytics A loving conversation with @jnunemaker
None
Background How hernias can be good for you
None
None
1 month Of evenings and weekends
1 year Since public launch
13 tiny servers 2 web, 6 app, 3 db, 2
queue
7-8 Million Page views per day
None
None
None
None
Implementation Imma show you how we do what we do
baby
Doing It (mostly) Live No aggregate querying
None
None
get('/track.gif') do track_service.record(...) TrackGif end
class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) end
end
class TrackProcessor def run loop { process } end def
process record @client.get(@queue) end def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) end end
http://bit.ly/rt-kestrel
class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)
Search.record(self) Notification.record(self) View.record(self) end end
class Resolution def record(hit) query = {'_id' => "..."} update
= {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
Pros
Pros Space
Pros Space RAM
Pros Space RAM Reads
Pros Space RAM Reads Live
Cons
Cons Writes
Cons Writes Constraints
Cons Writes Constraints More Forethought
Cons Writes Constraints More Forethought No raw data
http://bit.ly/rt-counters http://bit.ly/rt-counters2
Time Frame Minute, hour, month, day, year, forever?
# of Variations One document vs many
Single Document Per Time Frame
None
{ "t" => 336381, "u" => 158951, "2011" => {
"02" => { "18" => { "t" => 9, "u" => 6 } } } }
{ '$inc' => { 't' => 1, 'u' => 1,
'2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
Single Document For all ranges in time frame
None
{ "_id" =>"...:10", "bx" => { "320" => 85, "480"
=> 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
"600" => 7359, "768" => 4515, "900" => 3833, "1024"
=> 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,
'by.768' => 1, } }
Many Documents Search terms, content, referrers...
None
[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",
"sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
Writes {'_id' => "#{sid}:#{hash}"}
Reads [['sid', 1], ['v', -1]]
Growth Don’t say shard, don’t say shard...
Partition Hot Data Currently using collections for time frames
Bigger, Faster Server More CPU, RAM, Disk Space
Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites
Content Referrers Terms Engines Resolutions Locations
Partition by Function Spread writes across a few servers
Users Sites Content Referrers Terms Engines Resolutions Locations
Partition by Server Spread writes across a ton of servers,
way down the road, not worried yet
GitHub Thank you!
[email protected]
John Nunemaker MongoSF 2012 May 4,
2012 @jnunemaker