AI Observability with Envoy AI Gateway - ElasticON NYC

From Black Box to Glass Box linkedin.com/in/adrianfcole AI Observability with
Envoy AI Gateway

Open Source history includes.. Observability: OpenZipkin, OpenTelemetry, OpenInference (GenAI obs)
Usability: wazero (golang for wasm), func-e (easy start for envoy) Portability: Netflix Denominator (DNS clouds), jclouds (compute + storage) github.com/codefromthecrypt linkedin.com/in/adrianfcole @adrianfcole linkedin.com/in/adrianfcole • principal engineer at • focused on the genai dev -> prod transition

A Journey to AI Gateway First stab at LLM proxy
wasn’t enterprise ready! • First stab at LLM proxy was using FastAPI (Python) • 100s of tenants = 100s of configs - configuration explosion Envoy Gateway is a production, let’s use that! • "Let's use Envoy directly" - all AI developers failed at setup • Even experienced Envoy users struggled with DIY AI config Dan brought this to problem to the Envoy community and worked with Tetrate on a solution. Dan Sun: Bloomberg Cloud Native Compute Services & AI Inference and KServe Co-Founder

Quick recap on Agentic • LLM: Typically accessed via web
service, completes text, image, audio • MCP: Primarily a client+server protocol for tools, though it does a bit more • Agent: LLM loop that auto-completes actions (with tools) not just text, etc. LLM MCP

What is Envoy AI Gateway (aigw)? A clustered and fault
tolerant proxy that manages traffic from AI Agents in a multi-tenant deployment • Connect a routing API (typically OpenAI’s) to LLM or MCP providers • Throttle usage based on different tenant categories or cost models • Centralize authentication and authorization across different backends • Observability in terms of the above categories

Access Logs: AI-specific JSON with tokens, costs, TTFT Metrics: Prometheus-compatible
dashboards for real-time monitoring Traces: OpenInference-enriched spans enabling LLM, Embedding and MCP evaluation Logs, Metrics and Traces for AI

Obs Challenges, and what Envoy AI Gateway does The user
may think it is OpenAI when it is Bedrock: tracks the backend in dynamic metadata used in rate limiting and metrics. The original model may not be what’s used: tracks the original model from the proxy, the request sent to the backend vs and response model LLM Evals needs full request/response data: OpenInference format includes full LLM, MCP and Embedding inputs and outputs, with recording controls. Clients might not be instrumented: Customizable session.id capture from headers, which provide grouping for those not using OpenTelemetry on the clients.

Envoy AI Gateway needs LLM specific dynamic metadata for rate
limiting and other features You can place this alongside your other logging properties to see it in context of single requests Access Logging in Envoy AI Gateway

Records OpenTelemetry GenAI metrics: - `gen_ai.client.token.usage` - Input/output/total tokens -
`gen_ai.server.request.duration` - Total request time - `gen_ai.server.time_to_first_token` - Time to first token (streaming) - `gen_ai.server.time_per_output_token` - Inter-token latency (streaming) Exports metrics to OTEL or scrape Prometheus on the `/metrics` endpoint Adds performance data to dynamic metadata for downstream use: - `token_latency_ttft` - Time to first token in milliseconds (streaming) - `token_latency_itl` - Inter-token latency in milliseconds (streaming) Metrics in Envoy AI Gateway

Envoy AI Gateway proxies your real LLM, in or outside
K8s OpenTelemetry traces add context to web service calls, applied carefully at message layer for MCP! OpenInference is a trace schema built for evaluation of LLM, embedding and tool calls, by Arize (the dominant AI evaluation player) Tracing in Envoy AI Gateway

AI Gateway can do all the otel work, but it
isn’t grouped

Hello Elastic Distribution of OpenTelemetry (EDOT) Python! Add EDOT Python
package: pip install elastic-opentelemetry Run edot-bootstrap which analyzes the code to install any relevant instrumentation available: edot-bootstrap —-action=install Add OpenTelemetry environment variables OTEL_EXPORTER_OTLP_ENDPOINT OTEL_EXPORTER_OTLP_HEADERS Prefix python with opentelemetry-instrument or use auto_instrumentation.initialize() github.com/elastic/elastic-otel-python

Or use a script header! If you know your deps,
you can bootstrap on own elastic-opentelemetry is the EDOT entrypoint and works well with any OTEL instrumentation

Now, EDOT and Envoy AI Gateway work together! • Envoy
AI Gateway supports trace propagation for LLM and MCP calls • Real request spans are added to traces created by EDOT • Spans captured at the gateway are in eval-ready OpenInference format!

Elastic Stack: Logs, search, analytics, integrations, visualization, security Arize Phoenix:
LLM evaluation, user feedback, experimentation Or use EDOT Collector to send to Elastic and Arize Phoenix https://github.com/elastic/kibana/tree/main/x-pack/platform/packages/shared/kbn-evals

This is ElasticON, so let’s send to Elastic Cloud

Demo time https://github.com/elastic/observability-examples/tree/main/inference-platforms/aigw Run Envoy AI Gateway in standalone mode
and send traces to Elastic Cloud!

Envoy AI Gateway’s first user was Bloomberg, but more are
in production and can talk about it. Tencent Kubernetes Engine team internally host Envoy AI Gateway for a Model as a Service (MaaS) Tetrate Agent Router Service (TARS) is the first public SaaS running Envoy AI Gateway.. and is the recommended model provider in Goose! Blog on TARS+Goose with $10 free credit! Envoy AI Gateway production users are sharing!

Tetrate Agent Router Service Coming soon! • MCP Catalog and
Profiles • Export OTLP to Elastic Cloud! Nacx blog at MCP Summit London

What’s not done? Otel gaps in existing features • Raw
completions tracing (e.g. for fine tuning) hasn’t complete • Access Log export to OTLP hasn’t complete • Tracing needs to move to the upstream filter (capture actual LLM reqs) AI Gateway has some features TODO, regardless of Otel! • MCP is very new so will change quickly • OpenAI “Responses (stateful)” API used in new Agent tools • Should it do other specs like A2A and ACP (LF)?

Envoy AI Gateway is an LLM + MCP single agent
origin, with OpenTelemetry native features that work well with EDOT. Elastic Distribution of OpenTelemetry (EDOT) makes it easier for clients to send the entire workflow, connecting traces together when routed through a gateway. OpenInference is the GenAI-tuned OpenTelemetry trace schema for evals, used by many frameworks and even Kibana! Agent apps go beyond LLM and into tools and workflows. This is changing what it means to be an AI gateway! Key Takeaways & Next Steps linkedin.com/in/adrianfcole Blog on TARS+Goose with $10 free credit!

AI Observability with Envoy AI Gateway - Elasti...

AI Observability with Envoy AI Gateway - ElasticON NYC

Adrian Cole

More Decks by Adrian Cole

Other Decks in Technology

Featured

Transcript

From Black Box to Glass Box linkedin.com/in/adrianfcole AI Observability with

Open Source history includes.. Observability: OpenZipkin, OpenTelemetry, OpenInference (GenAI obs)

A Journey to AI Gateway First stab at LLM proxy

Quick recap on Agentic • LLM: Typically accessed via web

What is Envoy AI Gateway (aigw)? A clustered and fault

Access Logs: AI-specific JSON with tokens, costs, TTFT Metrics: Prometheus-compatible

Obs Challenges, and what Envoy AI Gateway does The user

Envoy AI Gateway needs LLM specific dynamic metadata for rate

Records OpenTelemetry GenAI metrics: - `gen_ai.client.token.usage` - Input/output/total tokens -

Envoy AI Gateway proxies your real LLM, in or outside

AI Gateway can do all the otel work, but it

Hello Elastic Distribution of OpenTelemetry (EDOT) Python! Add EDOT Python

Or use a script header! If you know your deps,

Now, EDOT and Envoy AI Gateway work together! • Envoy

Elastic Stack: Logs, search, analytics, integrations, visualization, security Arize Phoenix:

This is ElasticON, so let’s send to Elastic Cloud

Demo time https://github.com/elastic/observability-examples/tree/main/inference-platforms/aigw Run Envoy AI Gateway in standalone mode

Envoy AI Gateway’s first user was Bloomberg, but more are

Tetrate Agent Router Service Coming soon! • MCP Catalog and

What’s not done? Otel gaps in existing features • Raw

Envoy AI Gateway is an LLM + MCP single agent