Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When Dependencies Fail: Building Antifragile Ap...

When Dependencies Fail: Building Antifragile Applications in a Fragile World

Presented at Devnot - Devnot Summit 2025

Avatar for Selçuk Usta

Selçuk Usta

October 18, 2025
Tweet

More Decks by Selçuk Usta

Other Decks in Programming

Transcript

  1. AGENDA > A Small Change, A Global Disruption > Why

    Are Modern Applications So Fragile? > Resilience ≠ Just Keeping Things Running > Why Is It Often Overlooked? > Storms We May Face > Being Prepared for the Storm > Planned Flexibility + Learning from Mistakes
  2. A Small Change, A Global Disruption On June 12, 2025,

    a small update created a very large impact. A policy control module had not been fully tested before the rollout, because it only failed under very specific conditions. Once triggered, it caused Google’s global infrastructure to return many ‘503’ errors. Thousands of users, business processes, and applications were affected, including production systems.
  3. Why Are Modern Applications So Fragile? Modern applications cannot stand

    on their own. Databases, message queues, third-party APIs… They are all part of a chain. But a chain is only as strong as its weakest link. And when that link breaks, the whole system breaks.
  4. Resilience -neq Just Keeping Things Running Expect Failure, Don’t Assume

    Stability Every part of a system can fail. Don’t believe that databases, APIs, or queues will always work. Inject Chaos in a Controlled Way Create test situations like network delays, database timeouts, or API errors before going to production. Observe, Measure, and Learn Watch how the system reacts, collect data, and use it to improve resilience. Automate Recovery and Build for Self-Healing Use tools like failover, retries with backoff, and circuit breakers to recover automatically. Balance Experiments with User Impact Run chaos tests in safe environments so users are not harmed by experiments.
  5. Why Is It Often Overlooked? Deadlines over Timeplans In the

    triangle of quality, time, and cost, time usually becomes the main priority. Invisible Dependencies These dependencies run in the background, so the risk is often overlooked. Testing Gap Between Staging and Production Most resilience tests are done in staging, and the “real-world chaos” in production is missed. Short-Term Thinking The “quick fix” approach often leads to bigger problems in the future. Comfort of Ready-to-Use Frameworks We often believe the framework solves everything without simulating real scenarios.
  6. Storms We May Faced DDD (Disaster-Driven Design) Principles > Design

    with failures in mind. > Discuss weak points early. > Turn risks into acceptance criteria. > Make resilience part of the design, not an afterthought.
  7. … lets you simulate real network issues in your test

    and CI pipelines. It combines controlled fault injection with chaotic scenarios, helping you prove your system has no single point of failure. Toxiproxy
  8. Planning > App (Basic Web API) > MongoDB (Database) >

    Varnish (Reverse proxy with caching) > Toxiproxy (Simulation proxy)
  9. Custom Faults Network testing should not focus only on latency;

    even small protocol-level faults can lead to major issues.
  10. THANK YOU! Your feedback matters, just scan the QR. /in/selcukusta

    selcukusta.com selcukusta ustasoglu selcukusta (at)gmail.com