Upgrade to Pro — share decks privately, control downloads, hide ads and more …

We Built for Predictability; The Workloads Didn...

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

We Built for Predictability; The Workloads Didn’t Care

Infrastructure engineering has been built on predictability. If you specify the state, enforce it consistently, and eliminate drift, the system behaves as expected. Determinism was the goal and configuration management gave us the tools to get there. However, today’s AI models, agentic systems, and other probabilistic workloads break that mental model. You can deliver a perfectly reproducible environment and still see the application layer behave differently from run to run. The foundations didn’t fail; the workloads simply play by different rules.

This talk is about how to reason in that new world. When outcomes aren’t guaranteed, what can you rely on? How do you decide what must stay deterministic, and where you can safely allow randomness? How do you operate and troubleshoot when “it depends” becomes even more of a normal, expected answer? Instead of fighting to make probabilistic systems behave deterministically, we’ll explore how infrastructure engineers can build clarity, confidence, and reliability by embracing a different way of thinking; one that treats unpredictability as a property we can work with rather than a flaw we must eliminate.

Avatar for Michael Stahnke

Michael Stahnke

February 02, 2026
Tweet

More Decks by Michael Stahnke

Other Decks in Technology

Transcript

  1. Michael Stahnke VP of Stuff at Flox (flox.dev) @stahnma We

    Built for Predictability The Workloads Didn’t Care
  2. 5 Desired State is a Myth* @stahnma * Save for

    small values of desired and state
  3. 7 @stahnma We built an entire industry on the"Fixed Point"

    If the manifest is correct, the system is correct.
  4. 26 @stahnma So we took the most difficult part of

    computers, and replicated it and made it super easy to use at levels never imagined.
  5. 27 Kernighan's Law: Debugging is twice as hard as writing

    the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. @stahnma
  6. 33

  7. 34 Some LLM> I'd like the download area to show

    the version you downloading. Right now, it's not clear at all. @stahnma
  8. 35

  9. 36

  10. 38 Who would win? @stahnma Add a version badge to

    a website Rewrite a production perl application running for more than 20 years in Go
  11. 39 Who would win? @stahnma Add a version badge to

    a website
 
 Took about 6 hours of iteration, a terraform change, multiple approach changes, and edited something like 10 fi les. Tried multiple models, etc. Rewrite a 20 year old production perl application in Go
 
 
 One-shot (about 10 minutes) got > 80% of behavior correct and happy path was 100%.
  12. 44 @stahnma Idempotency Safety in repetition. Apply the same operation

    multiple times with the same result. Commutativity Order shouldn't matter. The "Holy Grail" of multi-node configuration management. Hermeticity Total isolation from the host. No external dependencies or side effects. The Three Pillars of Certainty
  13. 46 @stahnma The shift from "Static" to "Agentic" The environment

    is reproducible; the application behavior is not Deterministic foundations meeting probabilistic workloads The Intrusion of Probability
  14. 48 @stahnma Troubleshooting the "It Depends" Observability Dashboard ⾠ Boundary

    Violation ℹ Behavior within bounds Observability over Enforcement From "Is the file there?" to "Is the behavior within bounds?"
  15. 54 Since we can't guarantee f(x)=y, we must prove y

    stayed within acceptable bounds. @stahnma
  16. 55 For 20 years, we wanted ‘==‘ for everything. Now

    we must embrace ‘∈’ (member of) @stahnma
  17. 56 y is the output of your probabilistic workload ∈

    is a member of Ωsuccess is the "Sample Space" or the set of all acceptable/safe outcomes. @stahnma As in y ∈ Ωsuccess
  18. 57 Alternatively. ‖ y - ŷ ‖ > ε @stahnma

    y - What actually happened ŷ - What you wanted to happen ε - The distance between them (error budget)
  19. 58 In a probabilistic world, a "failed" run isn't necessarily

    a configuration error. It might just be the tail end of a distribution curve. @stahnma
  20. 60 @stahnma Workload Evaluation Criteria Standard Health Check: Is the

    process running? Is the port open? (Deterministic) WEC: Is the output within the expected "Confidence Interval"? (Probabilistic)
  21. 61 @stahnma Workload Evaluation Criteria Build Statistical Alarms Don't alert

    on a single P(Fail). Alert when the Shape of Success changes (e.g., the mean response time drifts or the hallucination rate spikes).
  22. 62 Chaos is a property we work with. @stahnma Is

    it correct vs Is it in bounds?
  23. 63 Chaos is a property we work with. @stahnma Is

    it correct vs Is it in bounds?
  24. 64 @stahnma Is it correct vs Is it in bounds?

    You should look at output distribution and test for it.
  25. 65 @stahnma Is it correct vs Is it in bounds?

    Shift from Unit Testing our infra to Statistical Testing our workloads.
  26. 66 @stahnma Is it correct vs Is it in bounds?

    If the LLM returns a hallucination, your Puppet run didn't fail. Your boundary did.
  27. 67 @stahnma The Anchor Principle If the Workload is a

    variable, the Environment must be a constant. You cannot reliably debug a probabilistic application on a mutable substrate.
  28. 68 @stahnma The Anchor Principle If the Workload is a

    variable, the Environment must be a constant. You cannot reliably debug a probabilistic application on a mutable substrate. 💜
  29. 69 @stahnma The new CI/CD Pipeline Step 1: Provision a

    Hermetic Environment. Step 2: Run the Probabilistic Workload n times. Step 3: Measure the Probability of Success (P).
  30. 70 @stahnma The Deployment Guide We don't ship because "the

    build passed.” We ship because "the success distribution in this environment remains stable.” Your infrastructure tool’s job is to ensure the Environment consistency so the Success Score is valid.
  31. 71 @stahnma Summary of Enforcement Pin the Substrate (recommended): Use

    Hermetic tools (I may suggest Flox) to lock the environment. Define the Curve: Establish a baseline for Normal Randomness. Monitor the Shape: Alert on shifts in probability, not just binary failures.
  32. 72 @stahnma Closing Determinism was the childhood Probability is our

    adulthood. We must build the hermetic foundations and work to make randomness safe.
  33. @stahnma When the code has unknown quality and risk levels,

    we need to treat it as hostile. Final Thoughts