Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Testing GenAI Applications - All Things Open AI

Testing GenAI Applications - All Things Open AI

This was a 1hr workshop at All Things Open AI on testing GenAI applications.

https://allthingsopen.ai/sessions/testing-genai-applications

Pretty cool theater environment and a lot of engaged folks inside. We founds some glitches as it was the first run, but all are corrected in the presentation materials here https://github.com/elastic/testing-genai-applications

Adrian Cole

March 17, 2025
Tweet

More Decks by Adrian Cole

Other Decks in Programming

Transcript

  1. Introductions! I’m Adrian from the Elastic Observability team. I mostly

    work on GenAI Ecosystem, including OpenTelemetry.. github.com/codefromthecrypt x.com/adrianfcole
  2. Objectives • Understand the basics of testing GenAI applications •

    Learn to use OpenAI CLI and SDK • Explore traffic inspection and observability • Write and test Python scripts for GenAI • Learn advanced testing techniques like HTTP replay and LLM eval
  3. Agenda • Example application • Prerequisites Setup • Exercises •

    Q&A https://github.com/elastic/testing-genai-applications Or scan -> QR code for the same!
  4. All exercises use the same example application We use the

    OpenAI CLI or Python SDK to ask a question via its API
  5. Prerequisites Setup Docker is generally required, but you can run

    exercises in Python • If you use python, it will be less downloading especially if one .venv • Docker is the easiest, and we use docker compose frequently An OpenAI compatible Inference Platform for running LLMs. • We have instructions for OpenAI, Ollama and Ramalama. OpenTelemetry is used to demonstrate observability • We have instructions for console, Elastic Stack and otel-tui. mitmproxy is used to demonstrate HTTP traffic interception
  6. Exercises • Use the OpenAI CLI • Inspect CLI traffic

    with mitmproxy • Trace CLI traffic with OpenTelemetry • Write an OpenAI application • Integration test your application • Unit test your application with recorded HTTP responses • Evaluation test your application using an LLM as a Judge
  7. 1: Use the OpenAI CLI • Learn to query LLMs

    with the OpenAI CLI • Run a simple question and get a response • Expect "Atlantic Ocean" as the answer
  8. 2: Inspect OpenAI traffic with mitmproxy • Run mitmweb to

    start the proxy • Run the OpenAI CLI with proxy configuration • Inspect the captured traffic in the web interface
  9. 3: Trace OpenAI traffic with OpenTelemetry • Instrumentation without code

    changes • GenAI signals for latency, prompt and usage • Choose your own APM with portable export
  10. 4: Write an OpenAI Application • Create Python script using

    OpenAI SDK • Manage configurations with environment variables • Enable observability via OpenTelemetry
  11. 7: Evaluate your application using an LLM as a Judge

    • Assess relevancy and hallucinations with DeepEval metrics. • Trace evaluation via OpenTelemetry.
  12. Takeaways and Thanks! OpenAI requires your best and most creative

    testing skills Unit Tests should record real HTTP requests in whatever way is best for your language If using python, use pytest-vcr Integration Tests should use OpenAI, but allow local model usage as well. Ollama is a very good option for local model hosting, and Qwen 2.5 is a great model Tests themselves should be strict in unit tests and flexible in integration tests LLMs responses are not entirely predictable, and can sometimes miss. Be aware of this. Observability and Evaluation use Elastic Distribution of OpenTelemetry (EDOT) SDKs to enable observability. Try ElasticStack and Eval platforms like Arize Phoenix and Langtrace github.com/codefromthecrypt x.com/adrianfcole www.linkedin.com/in/adrianfcole