Jfokus 2025 - Non-deterministic? No problem! You can test it!

@shelajev @edeandrea Eric Deandrea, Java Champion Oleg Šelajev, Java Champion
Non-deterministic? No problem! You can test it!

@shelajev @edeandrea • Java Champion • 25+ years software development
experience • Contributor to Open Source projects Quarkus Spring Boot, Spring Framework, Spring Security LangChain4j (& Quarkus LangChain4j) Wiremock Microcks • Boston Java Users ACM Chapter Board Member • Published Author • Cat lover • Black belt in martial arts About Us

@shelajev @edeandrea • Showcase & explain Quarkus, how it enables
modern Java development & the Kubernetes-native experience • Introduce familiar Spring concepts, constructs, & conventions and how they map to Quarkus • Equivalent code examples between Quarkus and Spring as well as emphasis on testing patterns & practices 3 https://red.ht/quarkus-spring-devs

@shelajev @edeandrea • Surprisingly, also, a Java Champion • 18+
years software development experience • ~11 years Developer Advocate • Loves to stare at the code of Open Source projects Quarkus Spring Boot LangChain4j Microcks Testcontainers (sometimes contributes bugs too!) About Us

@shelajev @edeandrea What are you hoping to learn here? What
are you hoping to learn here? What are you going to leave with?

@shelajev @edeandrea What is AI right now? Neural Networks •
Recognize, Predict, and Generate text • Trained on a VERY large corpuses of text • Deduce the statistical relationships between tokens • Can be ﬁne-tuned • Different models have varying capabilities An LLM predicts the next token based on its training data and statistical deduction

@shelajev @edeandrea The L of LLM == Large LLama 3.3:
- 70B parameters - Trained on > 15T publicly-available & > 25M synthetically-generated tokens - 128K token window - 43 Gb on disk DeepSeek R1: - 671B parameters - Trained on > 14.8T tokens - 32K token window - 404Gb on disk https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJwbB-ooYvzhCHaHcNUiA0_hY/edit?usp=sharing

@shelajev @edeandrea The rise of AI developer AI developers

@shelajev @edeandrea

@shelajev @edeandrea AI replacing humans

@shelajev @edeandrea AI replacing software

@shelajev @edeandrea https://www.youtube.com/watch?v=y57wwucbXR8

@shelajev @edeandrea https://github.com/edeandrea/non-deterministic-no-problem non-deterministic-no-problem

@shelajev @edeandrea Chat Bot Web Socket Claim AI Assistant Claim
Status Notiﬁcation Tool invocation Generate Email AI Assistant Output Guardrails Politeness AI Assistant AI replacing humans AI replacing software https://github.com/edeandrea/non-deterministic-no-problem non-deterministic-no-problem Code I write Is this code? Legend RAG Retrieval Input Guardrails

@shelajev @edeandrea How does your DevOps evolve when you infuse
your applications with AI?

@shelajev @edeandrea DevOps Evolution Dev Ops Release Deploy Operate Monitor
Plan Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML

@shelajev @edeandrea @shelajev @edeandrea Vanilla AI

@shelajev @edeandrea Application Database Application Service CRUD application Microservice Application
Model AI-Infused application What’s the difference between these?

Model AI-Infused application Integration Points What’s the difference between these?

@shelajev @edeandrea Testing AI Replacing Humans

@shelajev @edeandrea Testing AI Replacing Humans 23

@shelajev @edeandrea Rethink your approach

@shelajev @edeandrea Signal from tests: ❌ - stuff needs ﬁxing
✅ - conﬁdent to release

@shelajev @edeandrea Signal from tests: - stuff needs ﬁxing -
conﬁdent to release Purpose of tests: ❌ - prevent breaking prod ✅ - continuously improve your app

@shelajev @edeandrea https://www.upworthy.com/prankster-tricks-a-gm-dealership-chatbot-to-sell-him-a-76000-chevy-tahoe-for-1-rp3 https://www.cbsnews.com/news/aircanada-chatbot-discount-customer https://www.bbc.com/news/technology-35902104 https://www.spiceworks.com/tech/artiﬁcial-intelligence/news/meta-blender-bot-3-controversy https://www.linkedin.com/posts/stephanjanssen_princoming-activity-7285987635628507136-9Ubw

@shelajev @edeandrea Testing Strategies

@shelajev @edeandrea Why don’t normal tests work? What do we
need to do differently?

@shelajev @edeandrea This isn’t the answer!

Model AI-Infused application Integration Points Observability (metrics, tracing, auditing) Fault Tolerance (timeout, circuit-breaker, non-blocking, rate limiting, fallbacks, …) What’s the difference between these?

@shelajev @edeandrea https://library.wiremock.org/catalog/api/o/openai.com/openai-com https://mockgpt.wiremock.io https://docs.quarkiverse.io/quarkus-wiremock/dev

@shelajev @edeandrea https://www.trtworld.com/europe/swedish-recycling-so-successful-it-is-importing-rubbish-24491

@shelajev @edeandrea What happens when we do this?

@shelajev @edeandrea Guardrails

@shelajev @edeandrea Guardrails Prompt: Please return a JSON document in
the following format: { “name: “String”, “countryOfOrigin”: “String”} Response: Here is your JSON: ```json { “name”: “Eric”, “countryOfOrigin”: “USA” } ``` 👿 😱 Just give me the JSON!! 😭

@shelajev @edeandrea Guardrails - Functions used to validate the input
and output of the model - Detect invalid input or output - Detect prompt injection - Detect hallucination - Chain of guardrails - Sequential - Stop at ﬁrst failure

@shelajev @edeandrea Retry and Reprompt Output guardrails can have 4
different outcomes: - Success - Response is passed to the caller or next guardrail - Fatal - Stop and throw an exception - Retry - Call the model again with the same context we never know ;-) - Reprompt - Call the model again with another message in the model indicating how to ﬁx the response

@shelajev @edeandrea Observability

@shelajev @edeandrea Observability Collect metrics - Exposed as Prometheus -
Track token usage & cost OpenTelemetry Tracing - Trace interactions with the LLM Auditing - Track of interactions with the LLM - Can be persisted - Implemented by the application code

@shelajev @edeandrea Practices

@shelajev @edeandrea GitHub Actions

@shelajev @edeandrea Testcontainers Module

@shelajev @edeandrea Quarkus DevService

@shelajev @edeandrea AI and CI name: build-and-test on: push: pull_request:
jobs: jvm-build-test : runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Java uses: actions/setup-java@v4 with: java-version: 21 distribution: temurin cache: maven - name: Build and test run: ./mvnw clean verify [INFO] ------------------------------------------------------- [INFO] T E S T S [INFO] ------------------------------------------------------- [INFO] Running org.ericoleg.ndnp.ai.guardrail.CompositeOutputGuardrailTests INFO [org.tes.DockerClientFactory] (build-6) Testcontainers version: 1.20.4 INFO [org.tes.ima.PullPolicy] (build-6) Image pull policy will be performed by: DefaultPullPolicy() INFO [tc.ollama/ollama:latest] (build-28) Pulling docker image: ollama/ollama:latest. Please be patient; this may take some time but only needs to be done once. INFO [tc.ollama/ollama:latest] (docker-java-stream--32075139) Pulling image layers: 1 pending, 3 downloaded, 3 extracted, (1 GB/? MB) INFO [tc.ollama/ollama:latest] (docker-java-stream--32075139) Pull complete. 4 layers, pulled in 27s (downloaded 1 GB at 55 MB/s) INFO [tc.ollama/ollama:latest] (build-28) Image ollama/ollama:latest pull took PT28.552217137S INFO [tc.ollama/ollama:latest] (build-28) Creating container for image: ollama/ollama:latest INFO [tc.ollama/ollama:latest] (build-28) Container ollama/ollama:latest is starting: f2e61ad1b3490bec2f69db44ee0bd946d543c703fd3f30a0c507ac0b9c5db9a1 INFO [tc.ollama/ollama:latest] (build-28) Container ollama/ollama:latest started in PT0.681272661S INFO [io.qua.lan.oll.dep.dev.OllamaDevServicesProcessor] (build-28) Dev Services for Ollama started. INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (build-6) Pulling model llama3.2 INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 0.01% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 27.39% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 60.84% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 94.33% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 99.43% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-4-Worker-0) Verifying and cleaning up INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (build-6) Pulling model snowflake-arctic-embed INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-5-Worker-0) Downloading snowflake-arctic-embed - Progress: 1.90% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-5-Worker-0) Downloading snowflake-arctic-embed - Progress: 94.85% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-5-Worker-0) Verifying and cleaning up INFO [io.qua.lan.eas.run.EasyRagIngestor] (main) Ingesting documents from path: src/main/resources/policies, path matcher = glob:**, recursive = true INFO [io.qua.lan.eas.run.EasyRagIngestor] (main) Ingested 1 files as 2 documents INFO [io.qua.lan.eas.run.EasyRagIngestor] (main) Writing embeddings to /home/runner/work/non-deterministic-no-problem/non-deterministic-no-problem/easy-rag-embeddings.json INFO [io.quarkus] (main) non-deterministic-no-problem 1.0 on JVM (powered by Quarkus 3.17.7) started in 68.656s. Listening on: http://0.0.0.0:8081 INFO [io.quarkus] (main) Profiles test,ollama activated. INFO [io.quarkus] (main) Installed features: [agroal, awt, cdi, config-yaml, hibernate-orm, hibernate-orm-panache, jdbc-h2, langchain4j, langchain4j-easy-rag, langchain4j-ollama, langchain4j-ollama-dev-service, langchain4j-openai, langchain4j-websockets-next, mailer, mailpit, micrometer, narayana-jta, opentelemetry, playwright, poi, quinoa, qute, rest, rest-client, rest-client-jackson, rest-jackson, smallrye-context-propagation, smallrye-health, vertx, websockets-next] ... [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 72.79 s -- in org.ericoleg.ndnp.ai.guardrail.CompositeOutputGuardrailTests [INFO] Running org.ericoleg.ndnp.resources.ClaimWebsocketChatBotTests ... INFO [io.qua.lan.eas.run.EasyRagRecorder] (main) Reading embeddings from /home/runner/work/non-deterministic-no-problem/non-deterministic-no-problem/easy-rag-embeddings.json ...

@shelajev @edeandrea Use production setup

@shelajev @edeandrea @shelajev @edeandrea Prompt Engineering

@shelajev @edeandrea Prompts: Conﬁguration, Code or Data?

@shelajev @edeandrea Selection decisions are not application based

@shelajev @edeandrea Prompt Engineering and team topologies I said JSON!

@shelajev @edeandrea • Like static analysis ◦ Are we getting
better or worse over time? • Need to be able to monitor/track Systematic Eval: are you getting better or worse? https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html

@shelajev @edeandrea @shelajev @edeandrea Takeaways

@shelajev @edeandrea • Quarkus is awesome! Get the simple problems
out of the way ﬁrst. • Don’t forget your craft: DevOps process is there to help, write tests, expect change and failure, deploy often. • Local models are fun, but unless you’re an expert going with expensive, but powerful models is a good default rule of thumb. Eval later into using dumber models. • Models are just like all other software, package into containers, run like everything else. Actual takeaways

@shelajev @edeandrea https://quarkus.io @quarkusio https://quarkusio.zulipchat.com @quarkus.io

@shelajev @edeandrea How do I develop with and use containers?
How do I ﬁnd and share container images? How do I build compliant container images? How do I make my image builds faster? How can I run my resource-heavy services? Streamline your development practice A suite of solutions supporting great developer experiences with enterprise control docker.com

@shelajev @edeandrea @shelajev @edeandrea Thank You! https://www.jfokus.se/rate/2070 Please rate the
talk!

Jfokus 2025 - Non-deterministic? No problem! Yo...

Jfokus 2025 - Non-deterministic? No problem! You can test it!

Video

More Decks by Eric Deandrea

Other Decks in Technology

Featured

Transcript