Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jfokus 2025 - Non-deterministic? No problem! Yo...

Jfokus 2025 - Non-deterministic? No problem! You can test it!

https://www.jfokus.se/talks/2070

Testing is hard, which is why developers tend to avoid it. Testing non-deterministic things is even harder, which is unfortunate, since we're all writing AI-infused applications, and AI models are notoriously non-deterministic. What happens when the applications start using advanced features, such as RAG, tools, and agents? How do you test these applications? There must be some tools, technologies, and practices out there that can help, while not costing your organization lots of money!

Join Java Champions Oleg & Eric in this session as they explore some of these tools & technologies, such as Testcontainers, LangChain4j, Quarkus, and Ollama. They’ll bring together Oleg’s Testcontainers knowledge and Eric’s testing obsessions, getting hands-on and show how you can incorporate these tools and technologies into your inner and outer loop processes.

You’ll see how effortlessly Quarkus integrates with Testcontainers, and how Testcontainers can be used in conjunction with popular LLMs when writing tests. You’ll also learn about how to use containers to extend your testing into your CI environments, so you can always be sure that if your tests are green you’re good to go!

Eric Deandrea

February 04, 2025
Tweet

Video

More Decks by Eric Deandrea

Other Decks in Technology

Transcript

  1. @shelajev @edeandrea • Java Champion • 25+ years software development

    experience • Contributor to Open Source projects Quarkus Spring Boot, Spring Framework, Spring Security LangChain4j (& Quarkus LangChain4j) Wiremock Microcks • Boston Java Users ACM Chapter Board Member • Published Author • Cat lover • Black belt in martial arts About Us
  2. @shelajev @edeandrea • Showcase & explain Quarkus, how it enables

    modern Java development & the Kubernetes-native experience • Introduce familiar Spring concepts, constructs, & conventions and how they map to Quarkus • Equivalent code examples between Quarkus and Spring as well as emphasis on testing patterns & practices 3 https://red.ht/quarkus-spring-devs
  3. @shelajev @edeandrea • Surprisingly, also, a Java Champion • 18+

    years software development experience • ~11 years Developer Advocate • Loves to stare at the code of Open Source projects Quarkus Spring Boot LangChain4j Microcks Testcontainers (sometimes contributes bugs too!) About Us
  4. @shelajev @edeandrea What are you hoping to learn here? What

    are you hoping to learn here? What are you going to leave with?
  5. @shelajev @edeandrea What is AI right now? Neural Networks •

    Recognize, Predict, and Generate text • Trained on a VERY large corpuses of text • Deduce the statistical relationships between tokens • Can be fine-tuned • Different models have varying capabilities An LLM predicts the next token based on its training data and statistical deduction
  6. @shelajev @edeandrea The L of LLM == Large LLama 3.3:

    - 70B parameters - Trained on > 15T publicly-available & > 25M synthetically-generated tokens - 128K token window - 43 Gb on disk DeepSeek R1: - 671B parameters - Trained on > 14.8T tokens - 32K token window - 404Gb on disk https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJwbB-ooYvzhCHaHcNUiA0_hY/edit?usp=sharing
  7. @shelajev @edeandrea Chat Bot Web Socket Claim AI Assistant Claim

    Status Notification Tool invocation Generate Email AI Assistant Output Guardrails Politeness AI Assistant AI replacing humans AI replacing software https://github.com/edeandrea/non-deterministic-no-problem non-deterministic-no-problem Code I write Is this code? Legend RAG Retrieval Input Guardrails
  8. @shelajev @edeandrea DevOps Evolution Dev Ops Release Deploy Operate Monitor

    Plan Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML
  9. @shelajev @edeandrea Application Database Application Service CRUD application Microservice Application

    Model AI-Infused application What’s the difference between these?
  10. @shelajev @edeandrea Application Database Application Service CRUD application Microservice Application

    Model AI-Infused application Integration Points What’s the difference between these?
  11. @shelajev @edeandrea Signal from tests: - stuff needs fixing -

    confident to release Purpose of tests: ❌ - prevent breaking prod ✅ - continuously improve your app
  12. @shelajev @edeandrea Application Database Application Service CRUD application Microservice Application

    Model AI-Infused application Integration Points Observability (metrics, tracing, auditing) Fault Tolerance (timeout, circuit-breaker, non-blocking, rate limiting, fallbacks, …) What’s the difference between these?
  13. @shelajev @edeandrea Guardrails Prompt: Please return a JSON document in

    the following format: { “name: “String”, “countryOfOrigin”: “String”} Response: Here is your JSON: ```json { “name”: “Eric”, “countryOfOrigin”: “USA” } ``` 👿 😱 Just give me the JSON!! 😭
  14. @shelajev @edeandrea Guardrails - Functions used to validate the input

    and output of the model - Detect invalid input or output - Detect prompt injection - Detect hallucination - Chain of guardrails - Sequential - Stop at first failure
  15. @shelajev @edeandrea Retry and Reprompt Output guardrails can have 4

    different outcomes: - Success - Response is passed to the caller or next guardrail - Fatal - Stop and throw an exception - Retry - Call the model again with the same context we never know ;-) - Reprompt - Call the model again with another message in the model indicating how to fix the response
  16. @shelajev @edeandrea Observability Collect metrics - Exposed as Prometheus -

    Track token usage & cost OpenTelemetry Tracing - Trace interactions with the LLM Auditing - Track of interactions with the LLM - Can be persisted - Implemented by the application code
  17. @shelajev @edeandrea AI and CI name: build-and-test on: push: pull_request:

    jobs: jvm-build-test : runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Java uses: actions/setup-java@v4 with: java-version: 21 distribution: temurin cache: maven - name: Build and test run: ./mvnw clean verify [INFO] ------------------------------------------------------- [INFO] T E S T S [INFO] ------------------------------------------------------- [INFO] Running org.ericoleg.ndnp.ai.guardrail.CompositeOutputGuardrailTests INFO [org.tes.DockerClientFactory] (build-6) Testcontainers version: 1.20.4 INFO [org.tes.ima.PullPolicy] (build-6) Image pull policy will be performed by: DefaultPullPolicy() INFO [tc.ollama/ollama:latest] (build-28) Pulling docker image: ollama/ollama:latest. Please be patient; this may take some time but only needs to be done once. INFO [tc.ollama/ollama:latest] (docker-java-stream--32075139) Pulling image layers: 1 pending, 3 downloaded, 3 extracted, (1 GB/? MB) INFO [tc.ollama/ollama:latest] (docker-java-stream--32075139) Pull complete. 4 layers, pulled in 27s (downloaded 1 GB at 55 MB/s) INFO [tc.ollama/ollama:latest] (build-28) Image ollama/ollama:latest pull took PT28.552217137S INFO [tc.ollama/ollama:latest] (build-28) Creating container for image: ollama/ollama:latest INFO [tc.ollama/ollama:latest] (build-28) Container ollama/ollama:latest is starting: f2e61ad1b3490bec2f69db44ee0bd946d543c703fd3f30a0c507ac0b9c5db9a1 INFO [tc.ollama/ollama:latest] (build-28) Container ollama/ollama:latest started in PT0.681272661S INFO [io.qua.lan.oll.dep.dev.OllamaDevServicesProcessor] (build-28) Dev Services for Ollama started. INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (build-6) Pulling model llama3.2 INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 0.01% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 27.39% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 60.84% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 94.33% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 99.43% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-4-Worker-0) Verifying and cleaning up INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (build-6) Pulling model snowflake-arctic-embed INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-5-Worker-0) Downloading snowflake-arctic-embed - Progress: 1.90% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-5-Worker-0) Downloading snowflake-arctic-embed - Progress: 94.85% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-5-Worker-0) Verifying and cleaning up INFO [io.qua.lan.eas.run.EasyRagIngestor] (main) Ingesting documents from path: src/main/resources/policies, path matcher = glob:**, recursive = true INFO [io.qua.lan.eas.run.EasyRagIngestor] (main) Ingested 1 files as 2 documents INFO [io.qua.lan.eas.run.EasyRagIngestor] (main) Writing embeddings to /home/runner/work/non-deterministic-no-problem/non-deterministic-no-problem/easy-rag-embeddings.json INFO [io.quarkus] (main) non-deterministic-no-problem 1.0 on JVM (powered by Quarkus 3.17.7) started in 68.656s. Listening on: http://0.0.0.0:8081 INFO [io.quarkus] (main) Profiles test,ollama activated. INFO [io.quarkus] (main) Installed features: [agroal, awt, cdi, config-yaml, hibernate-orm, hibernate-orm-panache, jdbc-h2, langchain4j, langchain4j-easy-rag, langchain4j-ollama, langchain4j-ollama-dev-service, langchain4j-openai, langchain4j-websockets-next, mailer, mailpit, micrometer, narayana-jta, opentelemetry, playwright, poi, quinoa, qute, rest, rest-client, rest-client-jackson, rest-jackson, smallrye-context-propagation, smallrye-health, vertx, websockets-next] ... [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 72.79 s -- in org.ericoleg.ndnp.ai.guardrail.CompositeOutputGuardrailTests [INFO] Running org.ericoleg.ndnp.resources.ClaimWebsocketChatBotTests ... INFO [io.qua.lan.eas.run.EasyRagRecorder] (main) Reading embeddings from /home/runner/work/non-deterministic-no-problem/non-deterministic-no-problem/easy-rag-embeddings.json ...
  18. @shelajev @edeandrea • Like static analysis ◦ Are we getting

    better or worse over time? • Need to be able to monitor/track Systematic Eval: are you getting better or worse? https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html
  19. @shelajev @edeandrea • Quarkus is awesome! Get the simple problems

    out of the way first. • Don’t forget your craft: DevOps process is there to help, write tests, expect change and failure, deploy often. • Local models are fun, but unless you’re an expert going with expensive, but powerful models is a good default rule of thumb. Eval later into using dumber models. • Models are just like all other software, package into containers, run like everything else. Actual takeaways
  20. @shelajev @edeandrea How do I develop with and use containers?

    How do I find and share container images? How do I build compliant container images? How do I make my image builds faster? How can I run my resource-heavy services? Streamline your development practice A suite of solutions supporting great developer experiences with enterprise control docker.com