Learn how the Netflix API achieves fault tolerance in a distributed architecture while depending on dozens of systems that can fail at any time while serving more than two billion Web service calls each day to 1000+ different devices. Topics include common patterns, production examples, and operational learnings from the way Netflix incorporates fault and latency tolerance into its distributed systems, using circuit breakers, bulkheads, and other patterns embodied in the open source Hystrix library and operates them by using real-time metrics and data visualization tools.
Presented at JavaOne 2013: https://oracleus.activeevents.com/2013/connect/sessionDetail.ww?SESSION_ID=2624&tclass=popup
Hystrix at Netflix: http://techblog.netflix.com/2012/11/hystrix.html
Hystrix on Github: https://github.com/Netflix/Hystrix