With the wide adoption of micro-services and large-scale distributed systems, architectures have grow increasingly complex and hard to understand. Worse, the software systems running them have become extremely difficult to debug and test, increasing the risk of outages. With these new challenges, new tools are required and since failures have become more and more chaotic in nature, we must turn to chaos engineering in order to reveal failures before they become outages. In this talk, we will first introduce chaos engineering and show the audience how to start practicing chaos engineering on the AWS cloud. We will walk through the tools and methods they can use to inject failures in their architecture in order to make them more resilient to failure.
Following the previous introduction to Chaos Engineering, in this hands on session, I will show the audience how to practically inject failures into software systems using a few different tools and methods - e.g using Gremlin, Chaos Toolkit, AWS System Manager, AWS Lambda, ToxiProxy, etc.