Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How my ops guy totally ruined my realtime talk ...

Adam
October 23, 2012

How my ops guy totally ruined my realtime talk last week with a single nginx module

tl;dr: #Python #Architecture #Scalability #nginx #DISQUS

Disqus' realtime infrastructure, presented at Keeping It Realtime Conf 2012.

Adam

October 23, 2012
Tweet

More Decks by Adam

Other Decks in Technology

Transcript

  1. Adam Hitchcock @NorthIsUp Making DISQUS Realtime Thursday, March 21, 13

    i am an infrastructure engineer at disqus focus on realtime/firehose
  2. Adam Hitchcock @NorthIsUp Making DISQUS Realtime Thursday, March 21, 13

    thanks for watching me instead of the mini ipad thing. more of a story -this was the title of my talk, but it is now... more of a story
  3. Adam Hitchcock @NorthIsUp Making DISQUS Realtime Thursday, March 21, 13

    more of a story -this was the title of my talk, but it is now... more of a story
  4. Adam Hitchcock @NorthIsUp How my ops guy totally changed my

    talk last week with a single nginx module Thursday, March 21, 13
  5. what is DISQUS? Thursday, March 21, 13 How many of

    you will raise your hand? How many of you are python people? How many of you know what disqus is?
  6. why do realtime? ๏ getting new data to the user

    asap ๏ for increased engagement ๏ and it looks awesome ๏ and we can sell (or trade) it Thursday, March 21, 13 We define this as ‘less than 10 seconds’ but my goal was less than one.
  7. DISQUS sees a lot of tra c Google Analytics: Sept

    2012 - Oct 2012 Thursday, March 21, 13 the problem is that... - at max capacity - old system less than 100 thousand concurrent users
  8. realertime ๏ currently active on all DISQUS 2012 sites ๏

    tested ‘dark’ on our existing network ๏ 1.5 million concurrently connected users ๏ 45 thousand new connections per second ๏ 165 thousand messages/second ๏ <.2 seconds latency end to end Thursday, March 21, 13 - wtf: describe disqus 12 - i’ll re-visit on what dark means later on the testing slides - describe heavy tail distribution of popularity - end to end does NOT include the DISQUS app DEMO IT then “so how did we build this?”
  9. so, how did we do it? Thursday, March 21, 13

    last time I gave this talk I had a joke here. It was at euro python and...
  10. thoonk redis queue some python glue nginx push stream and

    long(er) polling Thursday, March 21, 13 python is our normal django site + some formatting and processing code redis is thoonk queue
  11. architecture overview redis queue nginx + push stream module “python

    glue” Gevent server New Posts ngnix /pub endpoint DISQUS embed clients http post DISQUS Thursday, March 21, 13 - HARDWARE - HA - 3 flows - new info -> pubsub - new subscriptions - pubsub -> subscriptions
  12. architecture overview django Formatter Publishers thoonk queue New Posts http

    post ngnix pub endpoint DISQUS embed clients other realtime stu nginx + push stream module Thursday, March 21, 13 - post save and post delete hooks - other realtime stuff + thoonk
  13. thoonk redis queue some python glue nginx push stream and

    long(er) polling Thursday, March 21, 13 python is our normal django site + some formatting and processing code redis is thoonk queue
  14. the thoonk queue ๏ thoonk is a queue on top

    of redis ๏ implemented as a DFA ๏ provides job semantics ๏ useful for end to end acking ๏ reliable job processing in distributed system ๏ did I mention it’s on top of redis? Thursday, March 21, 13 uses zset to store items so you have ranged queries (can’t do that on rabbit)
  15. thoonk redis queue some python glue nginx push stream and

    long(er) polling Thursday, March 21, 13
  16. the python glue ๏ listens to a thoonk queue ๏

    cleans & formats message ๏ this is the final format for end clients ๏ compress data now ๏ publish message to nginx and other firehoses ๏ forum:id, thread:id, user:id, post:id Formatter Publishers Thursday, March 21, 13 - django post save & post delete signals - thoonk was easy and fun! - end to end ack via thoonk. (not removed until fully published to nginx) - allows for multiple publishers, we publish to nginx, pubsubhubbub, commercial consumers.
  17. thoonk redis queue some python glue nginx push stream and

    long(er) polling Thursday, March 21, 13
  18. nginx push stream ๏ follow John Watson (@wizputer) for updated

    #humblebrags as we ramp up tra c ๏ an example config can be found here: http://bit.ly/disqus-nginx-push-stream http://wiki.nginx.org/HttpPushStreamModule Thursday, March 21, 13
  19. nginx push stream ๏ Turned on for ~50% of our

    network... ๏ ~950K subscribers (peak single machine) ๏ 144 Mbits/second (per machine) ๏ CPU usage is still well under 50% http://wiki.nginx.org/HttpPushStreamModule Thursday, March 21, 13 Have NOT tested SSL yet
  20. config push stream location = /pub { allow 127.0.0.1; deny

    all; push_stream_publisher admin; set $push_stream_channel_id $arg_channel; } location ^~ /sub/ { # to maintain api compatibility we need this location ~ /sub/(.*)/(.*)$ { set $push_stream_channels_path $1:$2; push_stream_subscriber streaming; push_stream_content_type application/json; } } http://wiki.nginx.org/HttpPushStreamModule Thursday, March 21, 13 and nginx does the rest
  21. examples # Subs curl -s 'localhost/sub/forum/cnn' curl -s 'localhost/sub/thread/907824578' curl

    -s 'localhost/sub/user/northisup' # Pubs curl -s -X POST 'localhost/pub?channel=forum:cnn' \ -d '{"some sort": "of json data"}' curl -s -X POST 'localhost/pub?channel=thread:907824578' \ -d '{"more": "json data"}' curl -s -X POST 'localhost/pub?channel=user:northisup' \ -d '{"the idea": "I think you get it by now"}' http://wiki.nginx.org/HttpPushStreamModule Thursday, March 21, 13
  22. measure nginx location = /push-stream-status { allow 127.0.0.1; deny all;

    push_stream_channels_statistics; set $push_stream_channel_id $arg_channel; } http://wiki.nginx.org/HttpPushStreamModule Thursday, March 21, 13 actually used this to build a realtime stream of popular threads on disqus
  23. thoonk redis queue some python glue nginx push stream and

    long(er) polling Thursday, March 21, 13
  24. long(er) polling onProgress: function () { var self = this;

    var resp = self.xhr.responseText; var advance = 0; var rows; // If server didn't push anything new, do nothing. if (!resp || self.len === resp.length) return; // Server returns JSON objects, one per line. rows = resp.slice(self.len).split('\n'); _.each(rows, function (obj) { advance += (obj.length + 1); obj = JSON.parse(obj); self.trigger('progress', obj); }); self.len += advance; } Thursday, March 21, 13 because on a busy thread this matters, 99% of the time, doesn’t matter (IGN E3) - peak post rate ~40 msg/sec - peak delivery ~164K msg/sec
  25. test ๏ Darktime ๏ use existing network to loadtest ๏

    (user complaints when it didn’t work...) ๏ Darkesttime ๏ load testing a single thread ๏ have knobs you can twiddle Thursday, March 21, 13
  26. measure ๏ measure all the things! ๏ especially when the

    numbers don’t line up ๏ measuring is hard in distributed systems ๏ try to express things as +1 and -1 if you can ๏ Sentry for measuring exceptions Thursday, March 21, 13 so when you have a talk you #humblebrag like “peak delivery rate” and stuff scales! gauges and aggregation
  27. lessons ๏ do hard (computation) work early ๏ end-to-end acks

    are good, but expensive ๏ redis/nginx pubsub is e ectively free Thursday, March 21, 13 - data processing and json formatting done once not 1000x times - gziping done once not 1000x times - defer setting up the work in the generator until as late as possible - ditched e2e acks from the fe, cost way too much
  28. special thanks ๏ the team at DISQUS ๏ like je

    who had to review all my code ๏ and especially our dev-ops guys ๏ like john watson a.k.a. @wizputer a.k.a the one who made me rewrite this talk psst, we’re hiring disqus.com/jobs Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp
  29. slide full o’ links ๏ Nginx push stream module http://wiki.nginx.org/HttpPushStreamModule

    ๏ Thoonk (redis queue) http://github.com/andyet/thoonk.py ๏ Sentry (distributed traceback aggregation) http://github.com/dcramer/sentry ๏ Gevent (python coroutines and greenlets) http://gevent.org/ ๏ Scales (in-app metrics) http://github.com/Greplin/scales code.disqus.com Thursday, March 21, 13 Tell me your thoughts! @NorthIsUp