AI Give AI workloads a clear, scalable path. AI for Platform Let GenAI run and accelerate the platform. THREE PARTS 01 Platform Engineering 2.0 Evolution, not revolution 02 Defining the dual roles Platform for AI × AI for Platform 03 Guarding the bottom line cost & security in the GenAI era
One paved, self-service path that hides infrastructure complexity — so teams ship fast and safe without each one reinventing the wheel. WITHOUT A PLATFORM × Every team wires up its own infra × Security & compliance bolted on, by hand × Cognitive load lands on developers → WITH A PLATFORM · THE GOLDEN PATH ✓ One shared, self-service path ✓ Security & compliance built in by default ✓ Builders stay focused on the product
AI now writes more code than ever. The constraint moved downstream — from producing software to validating and delivering it safely. YESTERDAY'S BOTTLENECK Writing the code → TODAY'S BOTTLENECK Validating & delivering it Source · circleci.com/resources/2026-state-of-software-delivery · stackoverflow.blog/2026/06/18/the-new-bottleneck
doesn't restart at each phase — it accumulates. What changes is who the platform serves and what it must carry. 2015–2018 DevOps era “You build it, you run it.” Shared ownership, fast feedback, continuous delivery. 2019–2025 Platform 1.0 Golden paths and self-service IDPs reduce cognitive load on developers. 2024–2025 Inflection point AI & sovereignty pressures, AI-driven threats, supply-chain shock. 2026+ Platform 2.0 AI-native, autonomous, business-aligned. WHAT DOESN'T CHANGE Platform as product MEANWHILE · THE GENAI CURVE Nov 2022 ChatGPT GenAI goes mainstream Mar 2023 GPT-4 Reasoning-grade coding 2024 Coding agents Copilot, Cursor, Devin Late 2024 MCP Open tool-use standard 2025 Agentic coding Claude Code, ADK, A2A Source · Platform Engineering 2.0: An evolution for the AI era, platformengineering.org (2026)
LEFT Push security, testing & infra config onto developers. The result: severe cognitive overload, and a safe path that depends on everyone remembering to follow it. → SHIFT DOWN Embed security, FinOps, compliance & guardrails into the platform itself. Immutable defaults. Secure by default The safe path becomes the only path. PLATFORM ENGINEERING 2.0 Source · Platform Engineering 2.0: An evolution for the AI era, platformengineering.org (2026) PLATFORM ENGINEERING 1.0
AI Operating System The same shift as the 2.0 timeline, seen as four platform generations — to locate where you sit today, and where this is heading. V1 Cloud Native 2018–2023 Ship software faster V2 AI-Native 2024–2026 Build with AI V3 Agentic 2026–2028 Agents become users V4 AI Operating System 2028+ Enterprise runs on agents Source · Weave Intelligence · State of Platform Engineering Vol. 4 (2025)
the ceiling Golden Paths kept teams inside the guardrails — until GenAI and agentic workflows ran into five structural ceilings. 01 AI-Blind Architecture Built for containers — no native GPU/TPU scheduling, MCP servers or agent guardrails. 02 Developer-Only Focus Security, data science, FinOps & agents aren't first-class users — shadow IT spreads. 03 Reactive, not Proactive No cost-aware provisioning or drift detection; AI spend spikes overnight. 04 Golden Paths → Golden Cages Rigid templates throttle the fast experimentation AI work demands. 05 Rigid & Static Compliance is a snapshot, not continuous; hard to repave across 200+ CNCF projects. The very Golden Paths that delivered speed now cap what AI workloads can do. Source · Platform Engineering 2.0: An evolution for the AI era, platformengineering.org (2026)
ceiling Each ceiling gets a matching pillar — the evolution from a developer IDP into an agentic enterprise control plane. AI-Native Platform breaks · AI-Blind Architecture AI as a first-class workload and agents as new platform users. Multi-Persona Experience breaks · Developer-Only Focus Six personas — app dev, platform eng, business, security, data/ML, and AI agents . Embedded FinOps breaks · Reactive provisioning Cost intelligence moves to the moment of provisioning. Security Shifts Down breaks · the AI attack surface Security sinks into platform & runtime — invisible & tamper-proof. Composable by Design breaks · Rigid & Static Pluggable, API-first planes instead of a monolith. Source · Platform Engineering 2.0: An evolution for the AI era, platformengineering.org (2026)
START HERE Platform for AI Give data scientists & AI engineers a clear, scalable golden path. ROLE 2 AI for Platform Let GenAI run and accelerate the platform itself.
six parallel planes Software Eng golden paths + CI/CD SRE · DevOps agents + observability Data Scientist GPUs + notebooks ML Engineer training + registry AI Engineer agent runtime + evals One platform, many front doors — and it spans six parallel planes P1 Persona Experience the five surfaces above P2 Agent Orchestration Agent runtime · MCP Gateway · A2A P3 Knowledge & Data BigQuery · Vertex AI Vector Search · Data Catalog P4 Integration & Delivery GitHub Actions · Platform Orchestrator · Terraform · Argo CD P5 Resource & Runtime GKE · Kueue + DWS · GPU pools · Image Streaming P6 Observability & Security Langfuse / Arize · OPA · Cloud IAM MOST OFTEN MISSING P2 Agents & P3 Knowledge are the two planes most teams are missing — the next two slides give each its own deep-dive. Source · Reference Architecture for an AI/ML Internal Developer Platform on GCP, platformengineering.org (2026)
by six planes The pillars say what the platform must be; the planes are where each one lives. They don't map one-to-one — that's the point. PILLAR ↓ PLANE → P1 Experience P2 Agents P3 Knowledge P4 Delivery P5 Runtime P6 Observ.+Sec. 01 Multi-Persona Experience every persona gets a front door 02 AI-Native Platform agents as users, AI as workload — spans three planes 03 Embedded FinOps cost decided at provisioning time 04 Security Shifts Down guardrails baked into the runtime 05 Composable by Design every plane independently swappable Composable by Design is the meta-principle — it's the reason the platform is six independently swappable planes at all. Source · Platform Engineering 2.0 & Reference Architecture for an AI/ML IDP, platformengineering.org (2026)
for agents RAG, vector & data stores, and the semantic layer all live here. This plane turns scattered enterprise knowledge into retrievable, governed context — code alone is never enough. RAW KNOWLEDGE Architecture & standards Runbooks & SOPs Incident history Code, docs & data lineage → THE P3 MACHINERY RAG pipeline chunk · embed · retrieve · rerank Vector store Vertex AI Vector Search Semantic layer shared metrics & meaning Data stores BigQuery · feature store · catalog + knowledge graph · governance & access control → OUTCOME Agent grounded & context-aware Source · Reference Architecture for an AI/ML IDP on GCP, platformengineering.org (2026)
Gemini Enterprise Agent Platform A managed component that plugs into your platform — not a replacement for it. It covers the agent-facing planes; your IDP still provides delivery & infrastructure, and stitches it in . Follow one request ① → ⑥ below. Covered by GEAP · P1 · P2 · P3 · P6 (partial) Your IDP provides · P4 delivery · P5 infra · integration Managed by GEAP — the agent plane P6 Governance ✦Agent Registry Policies & AI protection 1 Gemini Enterprise Workspace Custom apps 2 ✦ Agent Gateway 3 P2 ✦ Agent Runtime Agent Agent Identity 4 Memory Model 5 ✦ Agent Gateway Other agents Tools 6 P6 Agent observability YOUR IDP · THE FOUNDATION IT SITS ON P4 · Delivery — CI/CD · PR · policy gates P5 · Infra — GKE · Kueue · GPU · net Integration — wires GEAP into the golden path THE ASK “Stand up a feature-flag service and ship it.” ① State intent in Gemini Enterprise ② Governed entry via Agent Gateway ③ Plan & run with Agent Identity ④ Ground & reason on memory + model ⑤ Act on tools governed outbound → IDP delivery ⑥ Observe & approve human-on-the-loop Source · Gemini Enterprise Agent Platform overview, Google Cloud documentation (docs.cloud.google.com) P1 P2 P3 P3
a co-author It doesn't replace the engineer — it accelerates the three things platform work does most. How the loop actually works → next slide. Generate Produce Infrastructure-as-Code straight from intent. Debug Get architecture-aware diagnosis and design suggestions. Deliver Ship faster — and at higher, more consistent quality.
Inside the co-author loop Not a chatbot — a grounded, gated codegen loop. The agent reads the platform's own context and its output is checked before a human ever sees it. THE TASK “Provision a staging environment for the payments service.” P3 GROUND Pull the context Golden-path modules, live Terraform state, runbooks — not a blank prompt. Backstage module registry RAG → P2 GENERATE Synthesize the IaC The agent runtime writes the manifests and calls tools to do it. ADK agent Terraform / Crossplane MCP servers → P4 P6 VERIFY Gate before merge Diff, policy and security all checked before a human sees it. terraform plan OPA tfsec / Checkov → Human DELIVER Open a PR The engineer reviews a diff, not a prompt — then approves. GitHub PR human review Grounded in real context, gated before merge — that's why Level 1 is production-grade, not a demo. Source · Platform Engineering 2.0 — AI-assisted engineering · Backstage, Terraform, OPA, MCP (2026)
The planes you built — now the leash & the dashboard Nothing new to build here. Two planes from Part 1 — the agent plane and observability — are exactly what make agent autonomy safe to trust. ← BUILT IN PART 1 · P2 THE LEASH An agent acts only on its leash Identity proves who it is; the registry bounds what it may call. No identity, no action. Agent Identity Registry Runtime Governance ADK · MCP ← BUILT IN PART 1 · P6 THE DASHBOARD The signal you trust it on Token cost per task, per team TTFT latency TPOT throughput Drift LLM-as-a-judge Not infra metrics — the live read on whether an autonomous action is worth approving. Both built in Part 1 — together they're what lets you hand over the loop. → Living architecture Source · Platform Engineering 2.0 agent plane · LLM observability — Langfuse, Arize, Splunk (2026)
HUMAN-ON-THE-LOOP Living architecture in action YESTERDAY An engineer reads the logs, hand-writes the Terraform, and pushes it through CI. TRIGGER Anomaly detected An alert fires in production. → P2 ORCHESTRATE Repair agent triggers Owns the incident end to end. → P3 CONTEXT Pulls the context Runbooks, incidents, lineage. → P4 SANDBOX Builds & tests the fix Kernel-isolated Agent Sandbox. → P5 GUARDRAIL Opens a guarded PR Through the agent gateway. → H APPROVE Human approves One click — on the loop. The human shifts from executor to approver — Human-on-the-Loop. Source · Platform Engineering 2.0 — Living Architecture, platformengineering.org (2026)
Three control points turn spend from a month-end surprise into a guardrail enforced before anything runs. 01 Meter Every LLM call routes through a model gateway — tokens counted, team-labelled, streamed to BigQuery for showback. LiteLLM / Apigee BigQuery chargeback → 02 Quota GPU capacity is bounded by Kueue ResourceFlavors & quotas — estimated cost shows up right in the PR. Kueue quota MIG · spot · scale-to-zero → 03 Gate Pre-deploy cost gate is an admission controller — an over-budget manifest is rejected before it reaches the cluster. OPA Gatekeeper / Kyverno Cost stops being a month-end reconciliation — it's a policy decision at admission time. Source · Platform Engineering 2.0 FinOps · Kueue, OPA Gatekeeper, Kyverno, LiteLLM (2026)
a probabilistic system New threats — shadow AI, prompt injection, model poisoning, agent exfiltration — outrun shift-left checklists. The platform answers with four gates, invisible to the developer. 01 Identity Non-human workload identity per agent — short-lived mTLS certs. No more shared service accounts. SPIFFE / SPIRE 02 Gateway Every in/outbound message filtered — prompt-injection & DLP caught at the edge of the agent. Agent Gateway · Model Armor 03 Authorization Tools are allow-listed — an agent can only call what's been registered, nothing else. MCP Gateway · Agent Registry 04 Isolation Agent-run code executes in a kernel-isolated sandbox — a compromised agent can't break out. gVisor sandbox We don't trust the AI to behave — we use deterministic infrastructure to contain a probabilistic system. Source · SPIFFE/SPIRE · Model Armor · MCP · gVisor · Platform Engineering 2.0 (2026)
frees the data scientist's productivity ⇄ AI for Platform expands the ops team's leverage Under Platform Engineering 2.0, the two reinforce each other.
bolt-on LLM Don't discard existing investment — upgrade it into an AI-era foundation: V1 → V2 → V3. 02 Stand up the Agent & Knowledge planes MCP gateway, knowledge graph, tool registry, agent identity & governance. 03 From passive monitoring to active guardrails Shift security & FinOps down — deterministic policy-as-code at the foundation.