Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bucharest Tech Week 2026 - Reinventing testing ...

Bucharest Tech Week 2026 - Reinventing testing practices in the AI era

AI-infused applications demand a rethinking of our testing practices.

Developers face a new class of challenge as LLMs become standard integration points in modern applications: non-deterministic behavior that traditional testing approaches were never designed to handle. The current wave of distributed, orchestrated, agentic AI systems is evolving fast and, if we're being honest, it smells a lot like the early days of microservices.

In this session, we'll explore how your DevOps and testing practices must evolve when you wire AI into your applications. Not all AI failures look the same, and recognizing the difference is the first step toward building systems you can actually trust.

We'll walk through practical testing and observability strategies, using open source tools that give you confidence in AI-infused applications at every layer of the stack.

You'll leave with a concrete mental model for reasoning about AI failures and one grounding question: What if AI was just an API call?

Avatar for Eric Deandrea

Eric Deandrea PRO

June 18, 2026

More Decks by Eric Deandrea

Other Decks in Technology

Transcript

  1. @edeandrea Because we are not data scientists We integrate existing

    models Do you really want to do these in Python? • Transactions • Security • Scalability • Observability into enterprise- grade systems and applications Java??? 😯 … no seriously … why not Python? 🤔
  2. @edeandrea I don’t care if it works on your Jupyter

    notebook We are not shipping your Jupyter notebook
  3. @edeandrea • Java Champion • 27+ years software development experience

    • Works on Open Source projects Quarkus LangChain4j, Quarkus LangChain4j Docling Java Langfuse Java, Quarkus Langfuse Spring Boot, Spring Framework, Spring Security Testcontainers Wiremock Microcks • Boston Java Users ACM Chapter Vice Chair & Board Member • Published Author • Cat lover Who am I?
  4. @edeandrea • Showcase & explain Quarkus, how it enables modern

    Java development & the Kubernetes- native experience • Introduce familiar Spring concepts, constructs, & conventions and how they map to Quarkus • Equivalent code examples between Quarkus and Spring as well as emphasis on testing patterns & practices 9 https://red.ht/quarkus-spring-devs
  5. @edeandrea What are you hoping to learn here? What are

    you hoping to learn here? What are you going to leave with?
  6. @edeandrea What’s happening in industry? • Standardization ◦ Or lack

    thereof (lots of competing standards)? • Distributed • Orchestrated • Agentic • Agents • Agentic Agents • Autonomous Agents • Autonomous Agentic Agents Smells like microservices?
  7. @edeandrea DevOps Evolution Dev Ops Release Deploy Operate Monitor Plan

    Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML
  8. @edeandrea Application Database Application Service CRUD application Microservice Application Model

    AI-Infused application Integration Points What’s the difference between these?
  9. @edeandrea Application Database Application Service CRUD application Microservice Application Model

    AI-Infused application Integration Points What’s the difference between these? What do we do?
  10. @edeandrea Application Database Application Service CRUD application Microservice Application Model

    AI-Infused application Integration Points What’s the difference between these?
  11. @edeandrea Application Database Application Service CRUD application Microservice Application Model

    AI-Infused application Integration Points Observability (metrics, tracing, logs, auditing) Fault Tolerance (timeout, bulkhead, circuit breaker, rate limiting, fallbacks, …) What’s the difference between these?
  12. @edeandrea @edeandrea end-to-end tests unit tests integration tests low effort

    high realism tests with application server test REST endpoints tests using AI
  13. @edeandrea Observability Collect metrics - Exposed as Prometheus - Track

    token usage & cost OpenTelemetry Tracing - Trace interactions with the LLM Auditing - Track of interactions with the LLM - Ability to replay & re-score interactions Continuous evaluation - Evaluate interactions in real time
  14. @edeandrea Rescoring - Evaluation https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html#_evaluation 1. Sample ◦ The test

    case containing input parameters & expected output. 2. Function under test ◦ The function being evaluated. Receives input parameters & produces and actual output. 3. Evaluation Strategy ◦ Logic that determines if the actual output is acceptable based on the expected output. 4. Evaluation Result ◦ Outcome (pass/fail), score, explanation, and metadata from the evaluation
  15. @edeandrea Rescoring - Evaluation https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html#_evaluation 1. Sample ◦ The test

    case containing input parameters & expected output. 2. Function under test ◦ The function being evaluated. Receives input parameters & produces and actual output. 3. Evaluation Strategy ◦ Logic that determines if the actual output is acceptable based on the expected output. 4. Evaluation Result ◦ Outcome (pass/fail), score, explanation, and metadata from the evaluation
  16. @edeandrea Rescoring - Evaluation https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html#_evaluation 1. Sample ◦ The test

    case containing input parameters & expected output. 2. Function under test ◦ The function being evaluated. Receives input parameters & produces and actual output. 3. Evaluation Strategy ◦ Logic that determines if the actual output is acceptable based on the expected output. 4. Evaluation Result ◦ Outcome (pass/fail), score, explanation, and metadata from the evaluation
  17. @edeandrea • Naming things is still the hardest thing in

    computer science • Java is still relevant • Remember the testing pyramid! Use appropriate tools at each level! • LangChain4j & Quarkus are awesome! They provide foundational building blocks! • Don’t build observability into your apps - build it around your apps • Test in production! • Write tests, expect change and failure, deploy often • AI is just an API call Actual takeaways