Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Moving from LLM Gateways to a Single Agent Orig...

Moving from LLM Gateways to a Single Agent Origin - All Things AI 2026

Avatar for Adrian Cole

Adrian Cole

March 24, 2026
Tweet

More Decks by Adrian Cole

Other Decks in Technology

Transcript

  1. Moving from LLM Gateways to a Single Agent Origin Goose

    + MCP + ACP Adrian Cole · Principal Engineer at Tetrate
  2. @adrianfcole principal engineer at Tetrate, maintainer of Goose Open Source

    history includes Observability OpenZipkin, OpenTelemetry, OpenInference (GenAI obs) Usability wazero (Go for Wasm), func-e (easy Envoy) GenAI Envoy AI Gateway, Goose, ACP, LlamaStack FOCUSED ON THE GENAI DEV -> PROD TRANSITION
  3. Quick Recap: LLM, MCP, Agent LLM: Web service that completes

    text, image, audio MCP: Client+server protocol for tools (GitHub, Kiwi, Postman, etc.) Agent: LLM loop that auto-completes actions (with tools), not just text
  4. MCP Grew Up Fast When What Nov 2024 MCP launched

    — local only, ~100 servers Mar 2025 Streamable HTTP — remote servers possible, SaaS floodgates open Sep 2025 GitHub MCP GA, Official Registry ~2K, Smithery ~6K Mar 2026 16K+ servers, AAIF (Linux Foundation), multiple registries
  5. Goose: Action-Driven MCP-native: Built around MCP Community: >300 contributors Skills

    + Recipes: Reusable workflows AAIF: Linux Foundation block.github.io/goose
  6. Goose Recipes: Reusable Agent Workflows Reproducible | Shareable | Parameterized

    title: Flight Search description: Search flights via AI gateway. prompt: | Use search-flight to find flights from New York to Los Angeles on {{flight_date}}. Return the first 3 results. extensions: - name: mcp_gateway type: streamable_http uri: http://127.0.0.1:1975/mcp parameters: - key: flight_date input_type: string requirement: required default: "31/12/2026"
  7. Envoy AI Gateway Co-founded Bloomberg + Tetrate (Oct 2024) Open

    Source CNCF-backed, Apache 2.0 AI-aware 20+ LLM providers + MCP server routing TARS First public SaaS, recommended Goose provider Production users: A CLUSTERED, FAULT-TOLERANT PROXY FOR AI AGENT TRAFFIC
  8. Single Origin: LLM + MCP Through One Proxy Local Agent

    Calendar MCP AI Gateway LLM Backup LLM Search MCP One Token ONE PROXY FOR ROUTING, AUTH, RATE LIMITS, AND OBSERVABILITY. THE AGENT DOESN’T CARE WHICH BACKEND IS BEHIND THE GATEWAY.
  9. Authorization: The Tricky Part Agent OAuth API Key Anonymous GitHub

    Postman Kiwi Fragmented Auth OAuth here API keys there Anonymous elsewhere LLM backends: API keys · MCP servers: OAuth 2.1 Gateway injects credentials per-backend — no agent-side credential management
  10. What the Gateway Gives You Feature What it does Central

    auth API keys for LLM, OAuth for MCP, all in one place Token rate limiting Per-tenant budgets; prevent runaway agents Observability Access logs, Prometheus metrics, OpenInference traces Session correlation Agent session IDs auto-tag all logs and traces MCP multiplexing Aggregate tools from multiple backends
  11. Agent Flow: Single Origin in Action User Agent Gateway LLM

    MCP 1. Book flight 2. user message 3. tool_call: search_flights 4. tools/call: search_flights 5. flight options 6. tool_result 7. "Found AF123 for €299" 8. response BOTH LLM AND MCP CALLS THROUGH THE SAME PROXY. ONE ORIGIN, ONE SET OF LOGS.
  12. Session Correlation: Free Observability Goose sends session ID in HTTP

    headers Gateway auto-tags every access log entry Filter by session in your observability tool No agent-side instrumentation needed {"session_id":"goose-abc123","method":"POST","path":"/v1/chat/completions", "upstream":"openai","tokens_in":342,"tokens_out":89,"latency_ms":1240} {"session_id":"goose-abc123","method":"POST","path":"/mcp", "upstream":"kiwi","tool":"search-flight","latency_ms":820} {"session_id":"goose-abc123","method":"POST","path":"/v1/chat/completions", "upstream":"openai","tokens_in":1205,"tokens_out":156,"latency_ms":2100}
  13. Demo: Goose Recipe Through the Gateway # Terminal 1: start

    the gateway OPENAI_BASE_URL=http://localhost:11434/v1 OPENAI_API_KEY=unused \ aigw run --mcp-json '{"mcpServers":{"kiwi":{"type":"http","url":"https://mcp.kiwi.com"}}}' # Terminal 2: run the recipe OPENAI_HOST=http://127.0.0.1:1975 OPENAI_API_KEY=test-key \ goose run --provider openai --model qwen3:1.7b \ --recipe kiwi_recipe.yaml --params flight_date=31/12/2026
  14. The Agentic Landscape Shifted Agentic subscriptions: Claude Code, Cursor, Kiro

    — top plans are $200/mo Dozens of agents: Claude, Codex, Cursor, Goose, OpenClaw, StakPak Each tied to a specific CLI or SDK "Love the agent but need JetBrains? Love the UI but the agent is too expensive?"
  15. Goose’s Journey: CLI to Protocol Citizen Late 2024 Python CLI

    Direct APIs Dec 2024 Rust rewrite Desktop app Oct 2025 ACP server Zed, JetBrains connect Feb 2026 ACP providers Claude, Codex, Gemini as engine Goose's Journey Started as monolith: own UI, own LLM calls, own tools ACP separated frontend from engine Now: editors use Goose (ACP server) AND Goose uses other agents (ACP client) Claude/Codex via Zed-built adapters; Gemini CLI natively speaks ACP
  16. Agentic Protocols: ACP Is Not Alone AI Gateway Layer MCP

    agent → tools ACP editor → agent A2A agent → agent DIFFERENT LAYERS, NOT COMPETING — ENTERPRISES WILL NEED GATEWAYS FOR ALL THREE
  17. Gateway in Front of the Agent Zed JetBrains AI Gateway

    Rewrite MCP servers Set LLM per user Centralize auth Goose ACP ACP Gateway sits between editor and agent — controls what the agent sees Editor connects to agent via ACP — gateway sits in between Rewrite MCP servers per user, set LLM policy, centralize auth Agent sees only what the gateway allows
  18. Gateway Behind the Agent Goose (ACP client) AI Gateway Route

    + inject auth Claude Codex Gemini ACP Goose delegates to any engine via ACP — gateway routes and authenticates Goose uses ACP providers: Claude, Codex, Gemini as engines Gateway routes and authenticates between Goose and each engine Keep Goose’s recipes and extensions, use any subscription
  19. Full Circle: Single Agent Origin Auth Rate Limits Observability Single

    Agent Origin LLM Routing MCP Routing + ACP (next) Started: gateway as LLM proxy (routing, rate limiting) Added: MCP proxy (tool calls, OAuth, server multiplexing) Emerging: ACP-aware (session correlation, auth centralization)
  20. Key Takeaways 1. Single origin for LLM + MCP calls

    One proxy: routing, auth, throttling, observability. No per-developer drift. 2. Goose recipes Reusable multi-step workflows tested end-to-end through the gateway. Session IDs propagate for free observability. 3. ACP changes the landscape 44 agents, 42 clients. Break up with your agent without breaking your workflow. Gateways extend from LLM+MCP to the full agent communication stack.
  21. Thank You Install Goose $5 free LLMs from TARS Agent

    Client Protocol Agentic AI Foundation