Testing GenAI Applications - All Things Open AI

Testing OpenAI Applications https://github.com/elastic/testing-genai-applications Or scan -> QR code for
the same!

Introductions! I’m Adrian from the Elastic Observability team. I mostly
work on GenAI Ecosystem, including OpenTelemetry.. github.com/codefromthecrypt x.com/adrianfcole

Objectives • Understand the basics of testing GenAI applications •
Learn to use OpenAI CLI and SDK • Explore traﬃc inspection and observability • Write and test Python scripts for GenAI • Learn advanced testing techniques like HTTP replay and LLM eval

Agenda • Example application • Prerequisites Setup • Exercises •
Q&A https://github.com/elastic/testing-genai-applications Or scan -> QR code for the same!

All exercises use the same example application We use the
OpenAI CLI or Python SDK to ask a question via its API

Prerequisites Setup Docker is generally required, but you can run
exercises in Python • If you use python, it will be less downloading especially if one .venv • Docker is the easiest, and we use docker compose frequently An OpenAI compatible Inference Platform for running LLMs. • We have instructions for OpenAI, Ollama and Ramalama. OpenTelemetry is used to demonstrate observability • We have instructions for console, Elastic Stack and otel-tui. mitmproxy is used to demonstrate HTTP traﬃc interception

Exercises • Use the OpenAI CLI • Inspect CLI traﬃc
with mitmproxy • Trace CLI traﬃc with OpenTelemetry • Write an OpenAI application • Integration test your application • Unit test your application with recorded HTTP responses • Evaluation test your application using an LLM as a Judge

1: Use the OpenAI CLI • Learn to query LLMs
with the OpenAI CLI • Run a simple question and get a response • Expect "Atlantic Ocean" as the answer

2: Inspect OpenAI traffic with mitmproxy • Run mitmweb to
start the proxy • Run the OpenAI CLI with proxy configuration • Inspect the captured traffic in the web interface

3: Trace OpenAI traﬃc with OpenTelemetry • Instrumentation without code
changes • GenAI signals for latency, prompt and usage • Choose your own APM with portable export

4: Write an OpenAI Application • Create Python script using
OpenAI SDK • Manage conﬁgurations with environment variables • Enable observability via OpenTelemetry

5: Integration test your application

6: Oﬄine Unit Testing with VCR

7: Evaluate your application using an LLM as a Judge
• Assess relevancy and hallucinations with DeepEval metrics. • Trace evaluation via OpenTelemetry.

Takeaways and Thanks! OpenAI requires your best and most creative
testing skills Unit Tests should record real HTTP requests in whatever way is best for your language If using python, use pytest-vcr Integration Tests should use OpenAI, but allow local model usage as well. Ollama is a very good option for local model hosting, and Qwen 2.5 is a great model Tests themselves should be strict in unit tests and flexible in integration tests LLMs responses are not entirely predictable, and can sometimes miss. Be aware of this. Observability and Evaluation use Elastic Distribution of OpenTelemetry (EDOT) SDKs to enable observability. Try ElasticStack and Eval platforms like Arize Phoenix and Langtrace github.com/codefromthecrypt x.com/adrianfcole www.linkedin.com/in/adrianfcole

Testing GenAI Applications - All Things Open AI

Testing GenAI Applications - All Things Open AI

Adrian Cole

More Decks by Adrian Cole

Other Decks in Programming

Featured

Transcript

Testing OpenAI Applications https://github.com/elastic/testing-genai-applications Or scan -> QR code for

Introductions! I’m Adrian from the Elastic Observability team. I mostly

Objectives • Understand the basics of testing GenAI applications •

Agenda • Example application • Prerequisites Setup • Exercises •

All exercises use the same example application We use the

Prerequisites Setup Docker is generally required, but you can run

Exercises • Use the OpenAI CLI • Inspect CLI traﬃc

1: Use the OpenAI CLI • Learn to query LLMs

2: Inspect OpenAI traﬃc with mitmproxy • Run mitmweb to

3: Trace OpenAI traﬃc with OpenTelemetry • Instrumentation without code

4: Write an OpenAI Application • Create Python script using

5: Integration test your application

6: Oﬄine Unit Testing with VCR

7: Evaluate your application using an LLM as a Judge

Takeaways and Thanks! OpenAI requires your best and most creative