Test the full system: client, code, & provisioning code Code reviews != tests. Have both Continuous Integration (CI) is critical to velocity, quality, & transparency
out your deploy process! Dependencies on other systems make this even more important Canary testing, dark launches, feature flags, etc are good @Randommood
Alert fatigue has a high cost, don’t let it get that far Link alerts to playbook Routinely test your escalation paths De-prioritizing Insight ✨ @Randommood
management methods is required to succeed in today's enterprise IT environment. That customer enterprise environment never was like the simplified product development environment where Agile software development was conceived…” @Randommood
management methods is required to succeed in today's enterprise IT environment. That customer enterprise environment never was like the simplified product development environment where Agile software development was conceived…” @Randommood DUH
good inputs Reasonable reaction to incorrect input Time to Task (TTT) for Behavior after Goal Single node Multi node Clustered Cache enabled Given # of input/outputs Given uptime @Randommood
they are public or expensive to run Instrument / add metrics to track them Rank your services & data (what can you drop?) Capacity analysis is not dead ✨
data, replay of messages, anti-entropy build resilience Mechanisms to guard system resources are good to have Your system is also tied to the resources of its dependencies
internal Decisions have an expiration date. Periodically re- evaluate them as past you was much dumber A revisionist culture produces more resilient systems ✨ @Randommood
patterns Testing (full system!) Metrics & monitoring Convergence to good state Hazard inventories Redundancies Feature flags Dark deploys Runbooks & docs Canaries System verification Formal methods Fault injection The goal is to build failure domain independence
later Think in terms of tradeoffs TESTING MATTERS! Not all process is evil Keep in Mind Make system boundaries & dependencies explicit Playbooks are your friends, have them Use kill switches & limits Prioritize your services Distributed systems