Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB for Analytics

MongoDB for Analytics

Presented at MongoChicago on November 13, 2012.

John Nunemaker

November 13, 2012
Tweet

More Decks by John Nunemaker

Other Decks in Programming

Transcript

  1. GitHub
    John Nunemaker
    MongoChicago 2012
    November 12, 2012
    MongoDB for Analytics
    A loving conversation with @jnunemaker

    View full-size slide

  2. Background
    How hernias can be good for you

    View full-size slide

  3. 1 month
    Of evenings and weekends

    View full-size slide

  4. 18 months
    Since public launch

    View full-size slide

  5. 10-15 Million
    Page views per day

    View full-size slide

  6. 2.7 Billion
    Page views to date

    View full-size slide

  7. 13 tiny servers
    2 web, 6 app, 3 db, 2 queue

    View full-size slide

  8. requests/sec

    View full-size slide

  9. Implementation
    How we do what we do

    View full-size slide

  10. Doing It (mostly) Live
    No aggregate querying

    View full-size slide

  11. get('/track.gif') do
    track_service.record(...)
    TrackGif
    end

    View full-size slide

  12. class TrackService
    def record(attrs)
    message = MessagePack.pack(attrs)
    @client.set(@queue, message)
    end
    end

    View full-size slide

  13. class TrackProcessor
    def run
    loop { process }
    end
    def process
    record @client.get(@queue)
    end
    def record(message)
    attrs = MessagePack.unpack(message)
    Hit.record(attrs)
    end
    end

    View full-size slide

  14. http://bit.ly/rt-kestrel

    View full-size slide

  15. class Hit
    def record
    site.atomic_update(site_updates)
    Resolution.record(self)
    Technology.record(self)
    Location.record(self)
    Referrer.record(self)
    Content.record(self)
    Search.record(self)
    Notification.record(self)
    View.record(self)
    end
    end

    View full-size slide

  16. class Resolution
    def record(hit)
    query = {'_id' => "..."}
    update = {'$inc' => {}}
    update['$inc']["sx.#{hit.screenx}"] = 1
    update['$inc']["bx.#{hit.browserx}"] = 1
    update['$inc']["by.#{hit.browsery}"] = 1
    collection(hit.created_on)
    .update(query, update, :upsert => true)
    end
    end
    end

    View full-size slide

  17. Pros
    Space
    RAM

    View full-size slide

  18. Pros
    Space
    RAM
    Reads

    View full-size slide

  19. Pros
    Space
    RAM
    Reads
    Live

    View full-size slide

  20. Cons
    Writes
    Constraints

    View full-size slide

  21. Cons
    Writes
    Constraints
    More Forethought

    View full-size slide

  22. Cons
    Writes
    Constraints
    More Forethought
    No raw data

    View full-size slide

  23. http://bit.ly/rt-counters
    http://bit.ly/rt-counters2

    View full-size slide

  24. Time Frame
    Minute, hour, month, day, year, forever?

    View full-size slide

  25. # of Variations
    One document vs many

    View full-size slide

  26. Single Document
    Per Time Frame

    View full-size slide

  27. {
    "t" => 336381,
    "u" => 158951,
    "2011" => {
    "02" => {
    "18" => {
    "t" => 9,
    "u" => 6
    }
    }
    }
    }

    View full-size slide

  28. {
    '$inc' => {
    't' => 1,
    'u' => 1,
    '2011.02.18.t' => 1,
    '2011.02.18.u' => 1,
    }
    }

    View full-size slide

  29. Single Document
    For all ranges in time frame

    View full-size slide

  30. {
    "_id" =>"...:10",
    "bx" => {
    "320" => 85,
    "480" => 318,
    "800" => 1938,
    "1024" => 5033,
    "1280" => 6288,
    "1440" => 2323,
    "1600" => 3817,
    "2000" => 137
    },
    "by" => {
    "480" => 2205,
    "600" => 7359,

    View full-size slide

  31. "600" => 7359,
    "768" => 4515,
    "900" => 3833,
    "1024" => 2026
    },
    "sx" => {
    "320" => 191,
    "480" => 179,
    "800" => 195,
    "1024" => 1059,
    "1280" => 5861,
    "1440" => 3533,
    "1600" => 7675,
    "2000" => 1279
    }
    }

    View full-size slide

  32. {
    '$inc' => {
    'sx.1440' => 1,
    'bx.1280' => 1,
    'by.768' => 1,
    }
    }

    View full-size slide

  33. Many Documents
    Search terms, content, referrers...

    View full-size slide

  34. [
    {
    "_id" => ":",
    "t" => "ruby class variables",
    "sid" => BSON::ObjectId(''),
    "v" => 352
    },
    {
    "_id" => ":",
    "t" => "ruby unless",
    "sid" => BSON::ObjectId(''),
    "v" => 347
    },
    ]

    View full-size slide

  35. Writes
    {'_id' => "#{sid}:#{hash}"}

    View full-size slide

  36. Reads
    [['sid', 1], ['v', -1]]

    View full-size slide

  37. Growth
    Don’t say shard, don’t say shard...

    View full-size slide

  38. Partition Hot Data
    Currently using collections for time frames

    View full-size slide

  39. [
    "content.2011.7",
    "content.2011.8",
    "content.2011.9",
    "content.2011.10",
    "content.2011.11",
    "content.2011.12",
    "content.2012.1",
    "content.2012.2",
    "content.2012.3",
    "content.2012.4",
    ]

    View full-size slide

  40. [
    "resolutions.2011",
    "resolutions.2012",
    ]

    View full-size slide

  41. Move
    BigintMove

    View full-size slide

  42. Move
    BigintMove
    MakeYouWannaMove

    View full-size slide

  43. Move
    BigintMove
    MakeYouWannaMove
    DaMove

    View full-size slide

  44. Move
    BigintMove
    MakeYouWannaMove
    DaMove
    SmoothMove

    View full-size slide

  45. Move
    BigintMove
    MakeYouWannaMove
    DaMove
    SmoothMove
    NightMove

    View full-size slide

  46. Move
    BigintMove
    MakeYouWannaMove
    DaMove
    SmoothMove
    NightMove
    DanceMove

    View full-size slide

  47. Bigger, Faster Server
    More CPU, RAM, Disk Space

    View full-size slide

  48. Users
    Sites
    Content
    Referrers
    Terms
    Engines
    Resolutions
    Locations
    Users
    Sites
    Content
    Referrers
    Terms
    Engines
    Resolutions
    Locations

    View full-size slide

  49. Partition by Function
    Spread writes across a few servers

    View full-size slide

  50. Users
    Sites
    Content
    Referrers
    Terms
    Engines
    Resolutions
    Locations

    View full-size slide

  51. Partition by Server
    Spread writes across a ton of servers,
    way down the road, not worried yet

    View full-size slide

  52. GitHub
    Thank you!
    [email protected]
    John Nunemaker
    MongoChicago 2012
    November 12, 2012
    @jnunemaker

    View full-size slide