Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Crossroads of asynchrony and graceful degradati...

Crossroads of asynchrony and graceful degradation at QCon SF 2015

Netflix with more than 60 million subscribers worldwide and accounting for a third of the internet traffic in the United States, is a highly available internet service. In order to guarantee high availability for our service, we have architected our systems in a way that different failures modes in distributed systems causes graceful degradation and not unavailability.

In our constant endeavor to improve availability of our services, we are on our path to embrace asynchrony in its entirety in our services using libraries like RxJava and RxNetty. Transitioning from a synchronous world, asynchronous applications brings in interesting challenges as well as novel solutions specifically in terms of handling various different failure modes in distributed systems like latency, partial failures and abusive clients.

In this talk Nitesh Kant will describe how embracing asynchrony in our applications, from networking to business processing, creates gracefully degrading and highly resilient applications

Presented at QCon SF 2015: https://qconsf.com/sf2015/presentation/crossroads-of-asynchrony-and-graceful-degradation

Video: https://www.infoq.com/presentations/netflix-asynchronous-apps

Nitesh Kant

November 17, 2015
Tweet

More Decks by Nitesh Kant

Other Decks in Technology

Transcript

  1. Nitesh Kant Who Am I? ❖ Engineer, Edge Engineering, Netflix.

    ❖ Core contributor, RxNetty* ❖ Contributor, Zuul** * https://github.com/ReactiveX/RxNetty ** https://github.com/Netflix/zuul @NiteshKant
  2. public Movie getMovie(String movieId) { Metadata metadata = getMovieMetadata(movieId); Bookmark

    bookmark = getBookmark(movieId, userId); Rating rating = getRatings(movieId); return new Movie(metadata, bookmark, rating); } Disclaimer: This is an example and not an exact representation of the processing
  3. Synchronicity public Movie getMovie(String movieId) { Metadata metadata = getMovieMetadata(movieId);

    Bookmark bookmark = getBookmark(movieId, userId); Rating rating = getRatings(movieId); return new Movie(metadata, bookmark, rating); } Disclaimer: This is an example and not an exact representation of the processing
  4. The bigger picture Price of being synchronous? public Movie getMovie(String

    movieId) { Metadata metadata = getMovieMetadata(movieId); Bookmark bookmark = getBookmark(movieId, userId); Rating rating = getRatings(movieId); return new Movie(metadata, bookmark, rating); } Disclaimer: This is an example and not an exact representation of the processing
  5. In a microservices world Edge Service Ratings Service Video Metadata

    Service Bookmarks Service Disclaimer: This is an example and not an exact representation of the processing
  6. In a microservices world Edge Service Server threadpool Thread Thread

    Thread Thread Thread getMovieMetadata(movieId) Disclaimer: This is an example and not an exact representation of the processing
  7. In a microservices world Edge Service Server threadpool Thread Thread

    Thread Thread Thread getMovieMetadata(movieId) getBookmark(movieId, userId) Disclaimer: This is an example and not an exact representation of the processing
  8. In a microservices world Edge Service Server threadpool Thread Thread

    Thread Thread Thread getMovieMetadata(movieId) getBookmark(movieId, userId) getRatings(movieId) Disclaimer: This is an example and not an exact representation of the processing
  9. Edge Service Server threadpool Thread Thread Thread Thread Thread getRatings(movieId)

    Disclaimer: This is an example and not an exact representation of the processing
  10. Disclaimer: This is an example and not an exact representation

    of the processing Ratings Service Edge Service Server threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  11. Disclaimer: This is an example and not an exact representation

    of the processing Edge Service getRatings(movieId) Server threadpool Thread Thread Thread Thread Thread
  12. Edge Service Disclaimer: This is an example and not an

    exact representation of the processing getRatings(movieId) Server threadpool Thread Thread Thread Thread Thread
  13. Edge Service Server threadpool Thread Thread Thread Thread Thread Disclaimer:

    This is an example and not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  14. Edge Service Server threadpool Thread Thread Thread Thread Thread Disclaimer:

    This is an example and not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  15. Edge Service Server threadpool Thread Thread Thread Thread Thread Disclaimer:

    This is an example and not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  16. Managing client thread pools Disclaimer: This is an example and

    not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread
  17. Clients have become our babies Edge Service Server threadpool Thread

    Thread Thread Thread Thread getMovieMetadata(movieId) Disclaimer: This is an example and not an exact representation of the processing Client Threadpool Thread Thread Thread Thread Thread Client Threadpool Thread Thread Thread Thread Thread Client Threadpool Thread Thread Thread Thread Thread getBookmark(movieId, userId) getRatings(movieId)
  18. This happens at high CPU usage. So, don’t let the

    system reach that limit… a.k.a Throttling.
  19. Retries Edge Service Video Metadata Service Disclaimer: This is an

    example and not an exact representation of the processing
  20. Retries Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Video Metadata Service Cluster
  21. Retries Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Video Metadata Service Cluster
  22. Retries Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Video Metadata Service Cluster
  23. Retries Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Video Metadata Service Cluster
  24. This can not adapt… public Movie getMovie(String movieId) { Metadata

    metadata = getMovieMetadata(movieId); Bookmark bookmark = getBookmark(movieId, userId); Rating rating = getRatings(movieId); return new Movie(metadata, bookmark, rating); } Disclaimer: This is an example and not an exact representation of the processing
  25. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId)

    getBookmark(movieId, userId) getRatings(movieId) Application logic
  26. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId)

    getBookmark(movieId, userId) getRatings(movieId) I/O I/O I/O Application logic I/O
  27. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId)

    getBookmark(movieId, userId) getRatings(movieId) I/O I/O I/O Application logic I/O Network protocol
  28. Function composition public Movie getMovie(String movieId) { Metadata metadata =

    getMovieMetadata(movieId); Bookmark bookmark = getBookmark(movieId, userId); Rating rating = getRatings(movieId); return new Movie(metadata, bookmark, rating); }
  29. Function composition Composing the processing of a method into a

    single control point. public Observable<Movie> getMovie(String movieId) { return Observable.zip(getMovieMetadata(movieId), getBookmark(movieId, userId), getRatings(movieId), (meta,bmark,rating)->new Movie(meta,bmark,rating)); }
  30. Composing the processing of a method into a single control

    point. Flow & Lifecycle Control with
  31. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId)

    getBookmark(movieId, userId) getRatings(movieId) I/O I/O I/O Application logic I/O Network protocol
  32. I/O Edge Service Server threadpool Thread Thread Thread Thread Thread

    Client Threadpool Thread Thread Thread Thread Thread getRatings(movieId)
  33. I/O Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection Eventloops = f (Number of cores)
  34. I/O Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection Connections multiplexed on a single eventloop.
  35. I/O Disclaimer: This is an example and not an exact

    representation of the processing Edge Service Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection
  36. I/O Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Clients share the eventloops with the server.
  37. I/O Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection All clients share the same eventloop
  38. What should be async? Edge Service Video Metadata Service getMovieMetadata(movieId)

    getBookmark(movieId, userId) getRatings(movieId) I/O I/O I/O Application logic I/O Network protocol
  39. HTTP/1.1 GET /movie?id=3 HTTP/1.1 GET /movie?id=2 HTTP/1.1 GET /movie?id=1 HTTP/1.1

    HTTP/1.1 200 OK ID: 1 … HTTP/1.1 200 OK ID: 2 … HTTP/1.1 200 OK ID: 3 …
  40. HTTP/1.1 GET /movie?id=3 HTTP/1.1 GET /movie?id=2 HTTP/1.1 GET /movie?id=1 HTTP/1.1

    HTTP/1.1 200 OK ID: 1 … HTTP/1.1 200 OK ID: 2 … HTTP/1.1 200 OK ID: 3 … Head Of Line Blocking => Synchronous
  41. Composing the processing of the entire application into a single

    control point. Flow & Lifecycle Control with
  42. Edge Service Video Metadata Service Rating service C* store C*

    store /movie?id=123 Disclaimer: This is an example and not an exact representation of the processing
  43. Edge Service Video Metadata Service Rating service C* store C*

    store /movie?id=123 Disclaimer: This is an example and not an exact representation of the processing Observable<Movie>
  44. Latency Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection
  45. Latency Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Impact is localized to the connection.
  46. Latency Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Impact is localized to the connection. An outstanding request has little cost.
  47. An outstanding request has little cost. GET /movie?id=1 HTTP/1.1 HTTP/1.1

    200 OK … } Any stored state between request - response is costly.
  48. Overload & Thundering Herds Edge Service Eventloop (Inbound) Connection Connection

    Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Disclaimer: This is an example and not an exact representation of the processing Reduce work done when overloaded
  49. Overload & Thundering Herds Edge Service Eventloop (Inbound) Connection Connection

    Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Disclaimer: This is an example and not an exact representation of the processing Reduce work done when overloaded Stop accepting new requests.
  50. Request-leasing Peer 1 Peer 2 “Lease” 5 requests for 1

    minute. GET /movie?id=1 HTTP/1.1 GET /movie?id=2 HTTP/1.1 Network connection
  51. Server Capacity: 100 RPM Client 1 Client 2 Client 8

    “Lease” 10 requests for 1 minute.
  52. Server Capacity: 100 RPM Client 1 Client 2 Client 8

    “Lease” 10 requests for 1 minute. “Lease” 10 requests for 1 minute. “Lease” 10 requests for 1 minute.
  53. Server Capacity: 100 RPM Client 1 Client 2 Client 8

    “Lease” 10 requests for 1 minute. “Lease” 10 requests for 1 minute. “Lease” 10 requests for 1 minute. Reserve Capacity: 20 RPM
  54. Time bound lease. No extra work for cancelling leases. Receiver

    controls the flow of requests “Lease” 10 requests for 1 minute.
  55. Server Capacity: 20 RPM Client 1 Client 2 Client 8

    “Lease” 5 requests for 1 minute. “Lease” 2 requests for 1 minute. X No more “Lease”
  56. Server Capacity: 20 RPM Client 1 Client 2 Client 8

    “Lease” 5 requests for 1 minute. “Lease” 2 requests for 1 minute. X No more “Lease” Prioritization
  57. Threadpools? Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection I/O is non-blocking.
  58. Threadpools? Edge Service Disclaimer: This is an example and not

    an exact representation of the processing Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection Application code is non-blocking.
  59. Threadpools? Disclaimer: This is an example and not an exact

    representation of the processing Edge Service Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection No blocking/Waiting => Only CPU work
  60. Threadpools? Disclaimer: This is an example and not an exact

    representation of the processing Edge Service Eventloop (Inbound) Connection Connection Connection Connection Connection getMovieMetadata(movieId) Eventloop (Outbound) Connection Connection Connection Connection Connection No blocking/Waiting => Only CPU work So, Eventloops = # of cores
  61. Case for timeouts? Read Timeouts Thread Timeouts ✤ Useful in

    unblocking threads 
 on socket reads. ✤ Business level SLA. ✤ Unblock the calling thread.
  62. Case for timeouts? Read Timeouts Thread Timeouts ✤ Useful in

    unblocking threads 
 on socket reads. ✤ Business level SLA. ✤ Unblock the calling thread. X X As there are no blocking calls. X
  63. Case for timeouts? Read Timeouts Thread Timeouts ✤ Useful in

    unblocking threads 
 on socket reads. ✤ Business level SLA. ✤ Unblock the calling thread.
  64. Business level SLA Edge Service Video Metadata Service Disclaimer: This

    is an example and not an exact representation of the processing
  65. Business level SLA Edge Service Video Metadata Service Disclaimer: This

    is an example and not an exact representation of the processing Rating service C* store C* store
  66. Business level SLA Edge Service Video Metadata Service Disclaimer: This

    is an example and not an exact representation of the processing Rating service C* store C* store
  67. Business level SLA Edge Service Video Metadata Service Disclaimer: This

    is an example and not an exact representation of the processing Rating service C* store C* store Thread timeouts are pretty invasive at every level
  68. Business level SLA Edge Service Video Metadata Service Disclaimer: This

    is an example and not an exact representation of the processing Rating service C* store C* store Thread timeouts are pretty invasive at every level Do we need them at every step?
  69. Edge Service Video Metadata Service Rating service C* store C*

    store /movie?id=123 Disclaimer: This is an example and not an exact representation of the processing
  70. Edge Service Video Metadata Service Rating service C* store C*

    store /movie?id=123 Business timeouts are for a client request. Disclaimer: This is an example and not an exact representation of the processing
  71. Edge Service Video Metadata Service Rating service C* store C*

    store /movie?id=123 Disclaimer: This is an example and not an exact representation of the processing
  72. Edge Service Video Metadata Service Rating service C* store C*

    store /movie?id=123 X Disclaimer: This is an example and not an exact representation of the processing
  73. Edge Service Video Metadata Service Rating service C* store C*

    store /movie?id=123 X X X X X Disclaimer: This is an example and not an exact representation of the processing
  74. Edge Service Video Metadata Service Rating service C* store C*

    store Disclaimer: This is an example and not an exact representation of the processing Request Leases Cancellations Observable<Movie>
  75. Edge Service Video Metadata Service Rating service C* store C*

    store Disclaimer: This is an example and not an exact representation of the processing Request Leases Cancellations Observable<Movie>
  76. public Movie getMovie(String movieId) { Metadata metadata = getMovieMetadata(movieId); Bookmark

    bookmark = getBookmark(movieId, userId); Rating rating = getRatings(movieId); return new Movie(metadata, bookmark, rating); } public Observable<Movie> getMovie(String movieId) { return Observable.zip(getMovieMetadata(movieId), getBookmark(movieId, userId), getRatings(movieId), (meta,bmark,rating)->new Movie(meta,bmark,rating)); }
  77. Resources Asynchronous Function composition : I/O : Network Protocol :

    https://github.com/ReactiveX/RxJava https://github.com/ReactiveX/RxNetty http://reactivesocket.io/