It's a fact of life--software breaks.
But all is not doom and gloom. How we detect and handle errors drastically impacts the quality of both our systems and our lives. Knowing what to track, when to page, and how to find system weaknesses is critical.
You’ll leave this talk with tactics for coping with failures on multiple levels. We'll see how error handling and alerting ground a reliable system. Then we'll automate testing and finally induce problems in live, running code to see where our expectations and reality diverge.
Failure is inevitable, but that doesn't mean you can't fail well!