If microservice falls down in the middle of a server farm, does my pager make a sound? Hopefully, the answer is “yes!”. But all too often, services can become partially degraded in ways that are difficult to predict - and therefore difficult to monitor proactively. How can we develop the confidence that the services we develop are instrumented for observability in the right places - the parts which actually matter - so that we're alerted quickly to problems that arise and have enough information to resolve those problems?
We'll look at a framework for modeling interdependent systems so we can understand how to identify the areas of our code that need to be instrumented. By isolating these key components, we'll ensure that we are writing software designed for resiliency.