Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
MongoDB for Analytics
Search
John Nunemaker
PRO
May 04, 2012
Programming
21
2.2k
MongoDB for Analytics
Presented at MongoSF on May 4th, 2012.
John Nunemaker
PRO
May 04, 2012
Tweet
Share
More Decks by John Nunemaker
See All by John Nunemaker
Atom
jnunemaker
PRO
10
4.2k
MongoDB for Analytics
jnunemaker
PRO
11
900
Addicted to Stable
jnunemaker
PRO
32
2.5k
MongoDB for Analytics
jnunemaker
PRO
16
30k
Why You Should Never Use an ORM
jnunemaker
PRO
56
9.3k
Why NoSQL?
jnunemaker
PRO
10
920
Don't Repeat Yourself, Repeat Others
jnunemaker
PRO
7
3.4k
I Have No Talent
jnunemaker
PRO
14
960
Why MongoDB Is Awesome
jnunemaker
PRO
18
4.4k
Other Decks in Programming
See All in Programming
Empowering Developers with HTML-Aware ERB Tooling @ RubyKaigi 2025, Matsuyama, Ehime
marcoroth
2
960
AI時代の開発者評価について
ayumuu
0
230
Flutterでllama.cppをつかってローカルLLMを試してみた
sakuraidayo
0
130
音声プラットフォームのアーキテクチャ変遷から学ぶ、クラウドネイティブなバッチ処理 (20250422_CNDS2025_Batch_Architecture)
thousanda
0
400
On-the-fly Suggestions of Rewriting Method Deprecations
ohbarye
3
4.8k
Browser and UI #2 HTML/ARIA
ken7253
2
170
파급효과: From AI to Android Development
l2hyunwoo
0
160
State of Namespace
tagomoris
5
2.4k
Cursor/Devin全社導入の理想と現実
saitoryc
28
21k
七輪ライブラリー: Claude AI で作る Next.js アプリ
suneo3476
1
180
Beyond_the_Prompt__Evaluating__Testing__and_Securing_LLM_Applications.pdf
meteatamel
0
110
一緒に働きたくなるプログラマの思想 #QiitaConference
mu_zaru
79
20k
Featured
See All Featured
The Invisible Side of Design
smashingmag
299
50k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
What's in a price? How to price your products and services
michaelherold
245
12k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Documentation Writing (for coders)
carmenintech
71
4.7k
Measuring & Analyzing Core Web Vitals
bluesmoon
7
410
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
129
19k
VelocityConf: Rendering Performance Case Studies
addyosmani
329
24k
GraphQLとの向き合い方2022年版
quramy
46
14k
Designing Experiences People Love
moore
142
24k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Optimizing for Happiness
mojombo
378
70k
Transcript
GitHub John Nunemaker MongoSF 2012 May 4, 2012 MongoDB for
Analytics A loving conversation with @jnunemaker
None
Background How hernias can be good for you
None
None
1 month Of evenings and weekends
1 year Since public launch
13 tiny servers 2 web, 6 app, 3 db, 2
queue
7-8 Million Page views per day
None
None
None
None
Implementation Imma show you how we do what we do
baby
Doing It (mostly) Live No aggregate querying
None
None
get('/track.gif') do track_service.record(...) TrackGif end
class TrackService def record(attrs) message = MessagePack.pack(attrs) @client.set(@queue, message) end
end
class TrackProcessor def run loop { process } end def
process record @client.get(@queue) end def record(message) attrs = MessagePack.unpack(message) Hit.record(attrs) end end
http://bit.ly/rt-kestrel
class Hit def record site.atomic_update(site_updates) Resolution.record(self) Technology.record(self) Location.record(self) Referrer.record(self) Content.record(self)
Search.record(self) Notification.record(self) View.record(self) end end
class Resolution def record(hit) query = {'_id' => "..."} update
= {'$inc' => {}} update['$inc']["sx.#{hit.screenx}"] = 1 update['$inc']["bx.#{hit.browserx}"] = 1 update['$inc']["by.#{hit.browsery}"] = 1 collection(hit.created_on) .update(query, update, :upsert => true) end end end
Pros
Pros Space
Pros Space RAM
Pros Space RAM Reads
Pros Space RAM Reads Live
Cons
Cons Writes
Cons Writes Constraints
Cons Writes Constraints More Forethought
Cons Writes Constraints More Forethought No raw data
http://bit.ly/rt-counters http://bit.ly/rt-counters2
Time Frame Minute, hour, month, day, year, forever?
# of Variations One document vs many
Single Document Per Time Frame
None
{ "t" => 336381, "u" => 158951, "2011" => {
"02" => { "18" => { "t" => 9, "u" => 6 } } } }
{ '$inc' => { 't' => 1, 'u' => 1,
'2011.02.18.t' => 1, '2011.02.18.u' => 1, } }
Single Document For all ranges in time frame
None
{ "_id" =>"...:10", "bx" => { "320" => 85, "480"
=> 318, "800" => 1938, "1024" => 5033, "1280" => 6288, "1440" => 2323, "1600" => 3817, "2000" => 137 }, "by" => { "480" => 2205, "600" => 7359,
"600" => 7359, "768" => 4515, "900" => 3833, "1024"
=> 2026 }, "sx" => { "320" => 191, "480" => 179, "800" => 195, "1024" => 1059, "1280" => 5861, "1440" => 3533, "1600" => 7675, "2000" => 1279 } }
{ '$inc' => { 'sx.1440' => 1, 'bx.1280' => 1,
'by.768' => 1, } }
Many Documents Search terms, content, referrers...
None
[ { "_id" => "<oid>:<hash>", "t" => "ruby class variables",
"sid" => BSON::ObjectId('<oid>'), "v" => 352 }, { "_id" => "<oid>:<hash>", "t" => "ruby unless", "sid" => BSON::ObjectId('<oid>'), "v" => 347 }, ]
Writes {'_id' => "#{sid}:#{hash}"}
Reads [['sid', 1], ['v', -1]]
Growth Don’t say shard, don’t say shard...
Partition Hot Data Currently using collections for time frames
Bigger, Faster Server More CPU, RAM, Disk Space
Users Sites Content Referrers Terms Engines Resolutions Locations Users Sites
Content Referrers Terms Engines Resolutions Locations
Partition by Function Spread writes across a few servers
Users Sites Content Referrers Terms Engines Resolutions Locations
Partition by Server Spread writes across a ton of servers,
way down the road, not worried yet
GitHub Thank you!
[email protected]
John Nunemaker MongoSF 2012 May 4,
2012 @jnunemaker