Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From Cloud Native applications to Multi Agent D...

From Cloud Native applications to Multi Agent Distributed Systems

for more information visit https://salaboy.com

Avatar for Salaboy

Salaboy

July 01, 2026

More Decks by Salaboy

Other Decks in Technology

Transcript

  1. New Landscape Agents • Frameworks • Protocols ▪ MCP ▪

    A2A • Other stuff ▪ Skills Infra GPUs to run LLMs AI Gateways KV Caches
  2. Disclaimer  Business Agents Orchestrate business workflows, integrate APIs, and

    automate enterprise logic to drive decisions.  Coding Agents Write, test, debug, and execute code autonomously to build software and solve technical tasks.
  3. - Anthropic Nov 2024 -> Now AAIF - Client/Server architecture

    (LLM is the client) - Provide tool definitions that LLMs can call when needed Model Context Protocol FOLLOW US CLOUDNATIVEDAYSITALY.ORG
  4. - Google - April 2025 - Agent to Agent discovery

    and communication - Enterprise focus - security, long running, built on standards - Modality agnostic (Text, Audio, Video) - Complements MCP, it doesn’t compete Agent to Agent (A2A)
  5. Agentic Engineering • Level 1 - GenAI Assisted -> Chat

    bot assist to create the order ◦ Level 2 - GenAI Augmented -> agents driving the operations, humans approve ▪ Level 3 - Spec-Centric -> analyze the order and validate what is delivered • Level 4 - Selective Autonomy -> selecting drinks is auto approved ◦ Level 5 - Full Autonomy -> human needed when oven breaks BLOG https://www.salaboy.com/2026/06/08/reacting-to-ai/
  6. Why are these apps so hard to observe? Traditional tracing

    assumes linear request/response flows 1 Context propagation often breaks between agents, tools, and model calls 2 Agent decisions create dynamic, non-deterministic workflows 3 Execution spans many systems (LLMs, APIs, MCP servers, skills) 4 Key context lives outside the runtime (prompts, reasoning) 5
  7. Order pizza trace Numbers: ~275 spans ~48 seconds 10 services/agents

    Trace includes: • LLM reasoning loops • MCP tool calls • Workflow orchestration • Polling loops • Service calls
  8. More Telemetry != Better Telemetry Observation: 275+ spans per order

    But, • many spans represent infrastructure • some represent retries or polling (it can create a lot of noise) • some represent framework calls (chatty MCP / A2A) Goal: meaningful spans, not just more spans
  9. Agent Reasoning(GenAI SemConv) Example span from the trace: completion claude-haiku-4-5

    Attributes: gen_ai.request.model = claude-haiku-4-5 gen_ai.prompt = "You are a pizza cooking agent..." gen_ai.completion = "I'll cook a Pepperoni pizza..." gen_ai.usage.prompt_tokens = 1693 gen_ai.usage.completion_tokens = 71
  10. What about skills? Skills are often: • shell or python

    scripts • external tools • subprocesses These components: • are not instrumented • do not propagate trace context • break traces Java Agent TRACEPARENT env Shell script curl with trace header Service span
  11. - Matching traces to intent - The customer requested 2

    pizzas - Then 2 pizzas were cooked and the order delivered ✅ - The right ingredients were picked up from the inventory ✅ Moving to Level 3
  12. To build autonomy you need to build trust - When

    the CookingAgent cooked 10K pizzas - 99% ✅ - 1 % failed - Can we accept to refund 100 pizzas if things go wrong in exchange to let our Agent unsupervised? Moving to Level 4
  13. Key Takeaways 01 Observing agentic applications is quite hard 02

    Otel GenAI Semantic Conventions are a good step forward 03 We need more conventions for workflows & skills 04 Telemetry is critical to understand and validate agents