From Cloud Native applications to Multi Agent Distributed Systems

Mauricio Salatino Dash0 From Cloud Native to Agentic Applications

Ecosystem Engineer Dash0 Mauricio Salatino @salaboy.com @salaboy linkedin.com/in/salaboy ABOUT ME

New Landscape Agents • Frameworks • Protocols ▪ MCP ▪
A2A • Other stuff ▪ Skills Infra GPUs to run LLMs AI Gateways KV Caches

But Agents…?  What do we need to know about
them as platform engineers?

Architecture ???????? ???????? ????????

Disclaimer  Business Agents Orchestrate business workflows, integrate APIs, and
automate enterprise logic to drive decisions.  Coding Agents Write, test, debug, and execute code autonomously to build software and solve technical tasks.

- Anthropic Nov 2024 -> Now AAIF - Client/Server architecture
(LLM is the client) - Provide tool definitions that LLMs can call when needed Model Context Protocol FOLLOW US CLOUDNATIVEDAYSITALY.ORG

FOLLOW US CLOUDNATIVEDAYSITALY.ORG

- Google - April 2025 - Agent to Agent discovery
and communication - Enterprise focus - security, long running, built on standards - Modality agnostic (Text, Audio, Video) - Complements MCP, it doesn’t compete Agent to Agent (A2A)

- Anthropic 15th Dec 2025 - https://agentskills.io/ Skills FOLLOW US

Key Takeaways <LINK> 1 Month Ago

Pizza Store Agent LLM MCP Server Inventory APIs Architecture?? Skills
Drinks APIs Ovens APIs

https://github.com/salaboy/pizza-vibe DrinkAgent (In memory)

Agentic Engineering • Level 1 - GenAI Assisted -> Chat
bot assist to create the order ◦ Level 2 - GenAI Augmented -> agents driving the operations, humans approve ▪ Level 3 - Spec-Centric -> analyze the order and validate what is delivered • Level 4 - Selective Autonomy -> selecting drinks is auto approved ◦ Level 5 - Full Autonomy -> human needed when oven breaks BLOG https://www.salaboy.com/2026/06/08/reacting-to-ai/

The first step: Understanding your Agents

Why are these apps so hard to observe? Traditional tracing
assumes linear request/response flows 1 Context propagation often breaks between agents, tools, and model calls 2 Agent decisions create dynamic, non-deterministic workflows 3 Execution spans many systems (LLMs, APIs, MCP servers, skills) 4 Key context lives outside the runtime (prompts, reasoning) 5

Order pizza trace Numbers: ~275 spans ~48 seconds 10 services/agents
Trace includes: • LLM reasoning loops • MCP tool calls • Workflow orchestration • Polling loops • Service calls

More Telemetry != Better Telemetry Observation: 275+ spans per order
But, • many spans represent infrastructure • some represent retries or polling (it can create a lot of noise) • some represent framework calls (chatty MCP / A2A) Goal: meaningful spans, not just more spans

Agent Reasoning(GenAI SemConv) Example span from the trace: completion claude-haiku-4-5
Attributes: gen_ai.request.model = claude-haiku-4-5 gen_ai.prompt = "You are a pizza cooking agent..." gen_ai.completion = "I'll cook a Pepperoni pizza..." gen_ai.usage.prompt_tokens = 1693 gen_ai.usage.completion_tokens = 71

Agent Reasoning(GenAI SemConv)

What about skills? Skills are often: • shell or python
scripts • external tools • subprocesses These components: • are not instrumented • do not propagate trace context • break traces Java Agent TRACEPARENT env Shell script curl with trace header Service span

Trace from Skills Execution

- Matching traces to intent - The customer requested 2
pizzas - Then 2 pizzas were cooked and the order delivered ✅ - The right ingredients were picked up from the inventory ✅ Moving to Level 3

To build autonomy you need to build trust - When
the CookingAgent cooked 10K pizzas - 99% ✅ - 1 % failed - Can we accept to refund 100 pizzas if things go wrong in exchange to let our Agent unsupervised? Moving to Level 4

Key Takeaways 01 Observing agentic applications is quite hard 02
Otel GenAI Semantic Conventions are a good step forward 03 We need more conventions for workflows & skills 04 Telemetry is critical to understand and validate agents

Thank You!

From Cloud Native applications to Multi Agent D...

From Cloud Native applications to Multi Agent Distributed Systems

Salaboy

More Decks by Salaboy

Other Decks in Technology

Featured

Transcript

Mauricio Salatino Dash0 From Cloud Native to Agentic Applications

Ecosystem Engineer Dash0 Mauricio Salatino @salaboy.com @salaboy linkedin.com/in/salaboy ABOUT ME

New Landscape Agents • Frameworks • Protocols ▪ MCP ▪

Infra

Infra

Infra

But Agents…?  What do we need to know about

Architecture ???????? ???????? ????????

Disclaimer  Business Agents Orchestrate business workflows, integrate APIs, and

- Anthropic Nov 2024 -> Now AAIF - Client/Server architecture

FOLLOW US CLOUDNATIVEDAYSITALY.ORG

FOLLOW US CLOUDNATIVEDAYSITALY.ORG

- Google - April 2025 - Agent to Agent discovery

- Anthropic 15th Dec 2025 - https://agentskills.io/ Skills FOLLOW US

- Anthropic 15th Dec 2025 - https://agentskills.io/ Skills FOLLOW US

Key Takeaways <LINK> 1 Month Ago

#Demo

Pizza Store Agent LLM MCP Server Inventory APIs Architecture?? Skills

https://github.com/salaboy/pizza-vibe DrinkAgent (In memory)

Agentic Engineering • Level 1 - GenAI Assisted -> Chat

The first step: Understanding your Agents

Why are these apps so hard to observe? Traditional tracing

Order pizza trace Numbers: ~275 spans ~48 seconds 10 services/agents

More Telemetry != Better Telemetry Observation: 275+ spans per order

Agent Reasoning(GenAI SemConv) Example span from the trace: completion claude-haiku-4-5

Agent Reasoning(GenAI SemConv)

What about skills? Skills are often: • shell or python

Trace from Skills Execution

- Matching traces to intent - The customer requested 2

To build autonomy you need to build trust - When

Key Takeaways 01 Observing agentic applications is quite hard 02

Thank You!