Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When Platform Engineering Meets GenAI

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.
Avatar for suci suci
June 25, 2026

When Platform Engineering Meets GenAI

When Platform Engineering Meets GenAI

Avatar for suci

suci

June 25, 2026

More Decks by suci

Other Decks in Technology

Transcript

  1. Shu Hsi Lin (he/him) ABOUT Working with data and people

    INTERESTED IN Agile · Engineering culture · Developer experience Team coaching Data engineering sciwork member Find me on
  2. AGENDA The shape of this talk TWO FORCES Platform for

    AI Give AI workloads a clear, scalable path. AI for Platform Let GenAI run and accelerate the platform. THREE PARTS 01 Platform Engineering 2.0 Evolution, not revolution 02 Defining the dual roles Platform for AI × AI for Platform 03 Guarding the bottom line cost & security in the GenAI era
  3. FIRST, THE GROUND WE STAND ON What platform engineering is

    One paved, self-service path that hides infrastructure complexity — so teams ship fast and safe without each one reinventing the wheel. WITHOUT A PLATFORM × Every team wires up its own infra × Security & compliance bolted on, by hand × Cognitive load lands on developers → WITH A PLATFORM · THE GOLDEN PATH ✓ One shared, self-service path ✓ Security & compliance built in by default ✓ Builders stay focused on the product
  4. THE TURNING POINT Code output is no longer the bottleneck

    AI now writes more code than ever. The constraint moved downstream — from producing software to validating and delivering it safely. YESTERDAY'S BOTTLENECK Writing the code → TODAY'S BOTTLENECK Validating & delivering it Source · circleci.com/resources/2026-state-of-software-delivery · stackoverflow.blog/2026/06/18/the-new-bottleneck
  5. PLATFORM ENGINEERING 2.0 An evolution, not a reset The discipline

    doesn't restart at each phase — it accumulates. What changes is who the platform serves and what it must carry. 2015–2018 DevOps era “You build it, you run it.” Shared ownership, fast feedback, continuous delivery. 2019–2025 Platform 1.0 Golden paths and self-service IDPs reduce cognitive load on developers. 2024–2025 Inflection point AI & sovereignty pressures, AI-driven threats, supply-chain shock. 2026+ Platform 2.0 AI-native, autonomous, business-aligned. WHAT DOESN'T CHANGE Platform as product MEANWHILE · THE GENAI CURVE Nov 2022 ChatGPT GenAI goes mainstream Mar 2023 GPT-4 Reasoning-grade coding 2024 Coding agents Copilot, Cursor, Devin Late 2024 MCP Open tool-use standard 2025 Agentic coding Claude Code, ADK, A2A Source · Platform Engineering 2.0: An evolution for the AI era, platformengineering.org (2026)
  6. A CHANGE IN STRATEGY Shift Left → Shift Down SHIFT

    LEFT Push security, testing & infra config onto developers. The result: severe cognitive overload, and a safe path that depends on everyone remembering to follow it. → SHIFT DOWN Embed security, FinOps, compliance & guardrails into the platform itself. Immutable defaults. Secure by default The safe path becomes the only path. PLATFORM ENGINEERING 2.0 Source · Platform Engineering 2.0: An evolution for the AI era, platformengineering.org (2026) PLATFORM ENGINEERING 1.0
  7. THE EVOLUTION OF PLATFORM PHILOSOPHY From Cloud Native to an

    AI Operating System The same shift as the 2.0 timeline, seen as four platform generations — to locate where you sit today, and where this is heading. V1 Cloud Native 2018–2023 Ship software faster V2 AI-Native 2024–2026 Build with AI V3 Agentic 2026–2028 Agents become users V4 AI Operating System 2028+ Enterprise runs on agents Source · Weave Intelligence · State of Platform Engineering Vol. 4 (2025)
  8. COMPARISON & EVOLUTION · PLATFORM 1.0 Where Platform 1.0 hits

    the ceiling Golden Paths kept teams inside the guardrails — until GenAI and agentic workflows ran into five structural ceilings. 01 AI-Blind Architecture Built for containers — no native GPU/TPU scheduling, MCP servers or agent guardrails. 02 Developer-Only Focus Security, data science, FinOps & agents aren't first-class users — shadow IT spreads. 03 Reactive, not Proactive No cost-aware provisioning or drift detection; AI spend spikes overnight. 04 Golden Paths → Golden Cages Rigid templates throttle the fast experimentation AI work demands. 05 Rigid & Static Compliance is a snapshot, not continuous; hard to repave across 200+ CNCF projects. The very Golden Paths that delivered speed now cap what AI workloads can do. Source · Platform Engineering 2.0: An evolution for the AI era, platformengineering.org (2026)
  9. COMPARISON & EVOLUTION · PLATFORM 2.0 Five pillars break the

    ceiling Each ceiling gets a matching pillar — the evolution from a developer IDP into an agentic enterprise control plane. AI-Native Platform breaks · AI-Blind Architecture AI as a first-class workload and agents as new platform users. Multi-Persona Experience breaks · Developer-Only Focus Six personas — app dev, platform eng, business, security, data/ML, and AI agents . Embedded FinOps breaks · Reactive provisioning Cost intelligence moves to the moment of provisioning. Security Shifts Down breaks · the AI attack surface Security sinks into platform & runtime — invisible & tamper-proof. Composable by Design breaks · Rigid & Static Pluggable, API-first planes instead of a monolith. Source · Platform Engineering 2.0: An evolution for the AI era, platformengineering.org (2026)
  10. PART 2 Defining the dual roles ROLE 1 · WE

    START HERE Platform for AI Give data scientists & AI engineers a clear, scalable golden path. ROLE 2 AI for Platform Let GenAI run and accelerate the platform itself.
  11. AI/ML IDP ON GOOGLE CLOUD Every persona, one platform —

    six parallel planes Software Eng golden paths + CI/CD SRE · DevOps agents + observability Data Scientist GPUs + notebooks ML Engineer training + registry AI Engineer agent runtime + evals One platform, many front doors — and it spans six parallel planes P1 Persona Experience the five surfaces above P2 Agent Orchestration Agent runtime · MCP Gateway · A2A P3 Knowledge & Data BigQuery · Vertex AI Vector Search · Data Catalog P4 Integration & Delivery GitHub Actions · Platform Orchestrator · Terraform · Argo CD P5 Resource & Runtime GKE · Kueue + DWS · GPU pools · Image Streaming P6 Observability & Security Langfuse / Arize · OPA · Cloud IAM MOST OFTEN MISSING P2 Agents & P3 Knowledge are the two planes most teams are missing — the next two slides give each its own deep-dive. Source · Reference Architecture for an AI/ML Internal Developer Platform on GCP, platformengineering.org (2026)
  12. PRINCIPLES → ARCHITECTURE · HOW THEY ALIGN Five principles, realized

    by six planes The pillars say what the platform must be; the planes are where each one lives. They don't map one-to-one — that's the point. PILLAR ↓ PLANE → P1 Experience P2 Agents P3 Knowledge P4 Delivery P5 Runtime P6 Observ.+Sec. 01 Multi-Persona Experience every persona gets a front door 02 AI-Native Platform agents as users, AI as workload — spans three planes 03 Embedded FinOps cost decided at provisioning time 04 Security Shifts Down guardrails baked into the runtime 05 Composable by Design every plane independently swappable Composable by Design is the meta-principle — it's the reason the platform is six independently swappable planes at all. Source · Platform Engineering 2.0 & Reference Architecture for an AI/ML IDP, platformengineering.org (2026)
  13. PLANE 3 · KNOWLEDGE & DATA The knowledge plane: context

    for agents RAG, vector & data stores, and the semantic layer all live here. This plane turns scattered enterprise knowledge into retrievable, governed context — code alone is never enough. RAW KNOWLEDGE Architecture & standards Runbooks & SOPs Incident history Code, docs & data lineage → THE P3 MACHINERY RAG pipeline chunk · embed · retrieve · rerank Vector store Vertex AI Vector Search Semantic layer shared metrics & meaning Data stores BigQuery · feature store · catalog + knowledge graph · governance & access control → OUTCOME Agent grounded & context-aware Source · Reference Architecture for an AI/ML IDP on GCP, platformengineering.org (2026)
  14. THE CONCRETE EXAMPLE · ARCHITECTURE + HOW ONE REQUEST RUNS

    Gemini Enterprise Agent Platform A managed component that plugs into your platform — not a replacement for it. It covers the agent-facing planes; your IDP still provides delivery & infrastructure, and stitches it in . Follow one request ① → ⑥ below. Covered by GEAP · P1 · P2 · P3 · P6 (partial) Your IDP provides · P4 delivery · P5 infra · integration Managed by GEAP — the agent plane P6 Governance ✦Agent Registry Policies & AI protection 1 Gemini Enterprise Workspace Custom apps 2 ✦ Agent Gateway 3 P2 ✦ Agent Runtime Agent Agent Identity 4 Memory Model 5 ✦ Agent Gateway Other agents Tools 6 P6 Agent observability YOUR IDP · THE FOUNDATION IT SITS ON P4 · Delivery — CI/CD · PR · policy gates P5 · Infra — GKE · Kueue · GPU · net Integration — wires GEAP into the golden path THE ASK “Stand up a feature-flag service and ship it.” ① State intent in Gemini Enterprise ② Governed entry via Agent Gateway ③ Plan & run with Agent Identity ④ Ground & reason on memory + model ⑤ Act on tools governed outbound → IDP delivery ⑥ Observe & approve human-on-the-loop Source · Gemini Enterprise Agent Platform overview, Google Cloud documentation (docs.cloud.google.com) P1 P2 P3 P3
  15. THE GOLDEN-PATH PAYOFF Builders fight problems, not infrastructure. Whether they're

    shipping a model or a service, a self-service path hides Kubernetes and networking — so people stay focused on the work that matters.
  16. PART 2 · ROLE 2 OF 2 AI for Platform

    GenAI as an accelerator for platform operations LEVEL 1 · THIS TALK Assistant AI co-authors the work — generate, debug, deliver. → LEVEL 2 · THIS TALK Autonomy Agents act, humans approve — Human-on-the-Loop. → LEVEL 3 · THE HORIZON Autopilot Human-out-of-the-Loop — beyond today's scope.
  17. MATURITY LEVEL 1 · ASSISTANT — AI-ASSISTED ENGINEERING AI as

    a co-author It doesn't replace the engineer — it accelerates the three things platform work does most. How the loop actually works → next slide. Generate Produce Infrastructure-as-Code straight from intent. Debug Get architecture-aware diagnosis and design suggestions. Deliver Ship faster — and at higher, more consistent quality.
  18. MATURITY LEVEL 1 · ASSISTANT — ANATOMY OF ONE TASK

    Inside the co-author loop Not a chatbot — a grounded, gated codegen loop. The agent reads the platform's own context and its output is checked before a human ever sees it. THE TASK “Provision a staging environment for the payments service.” P3 GROUND Pull the context Golden-path modules, live Terraform state, runbooks — not a blank prompt. Backstage module registry RAG → P2 GENERATE Synthesize the IaC The agent runtime writes the manifests and calls tools to do it. ADK agent Terraform / Crossplane MCP servers → P4 P6 VERIFY Gate before merge Diff, policy and security all checked before a human sees it. terraform plan OPA tfsec / Checkov → Human DELIVER Open a PR The engineer reviews a diff, not a prompt — then approves. GitHub PR human review Grounded in real context, gated before merge — that's why Level 1 is production-grade, not a demo. Source · Platform Engineering 2.0 — AI-assisted engineering · Backstage, Terraform, OPA, MCP (2026)
  19. BEFORE YOU HAND OVER THE KEYS · THE TRUST GATE

    The planes you built — now the leash & the dashboard Nothing new to build here. Two planes from Part 1 — the agent plane and observability — are exactly what make agent autonomy safe to trust. ← BUILT IN PART 1 · P2 THE LEASH An agent acts only on its leash Identity proves who it is; the registry bounds what it may call. No identity, no action. Agent Identity Registry Runtime Governance ADK · MCP ← BUILT IN PART 1 · P6 THE DASHBOARD The signal you trust it on Token cost per task, per team TTFT latency TPOT throughput Drift LLM-as-a-judge Not infra metrics — the live read on whether an autonomous action is worth approving. Both built in Part 1 — together they're what lets you hand over the loop. → Living architecture Source · Platform Engineering 2.0 agent plane · LLM observability — Langfuse, Arize, Splunk (2026)
  20. MATURITY LEVEL 2 · AUTONOMY — FROM EXECUTOR TO APPROVER,

    HUMAN-ON-THE-LOOP Living architecture in action YESTERDAY An engineer reads the logs, hand-writes the Terraform, and pushes it through CI. TRIGGER Anomaly detected An alert fires in production. → P2 ORCHESTRATE Repair agent triggers Owns the incident end to end. → P3 CONTEXT Pulls the context Runbooks, incidents, lineage. → P4 SANDBOX Builds & tests the fix Kernel-isolated Agent Sandbox. → P5 GUARDRAIL Opens a guarded PR Through the agent gateway. → H APPROVE Human approves One click — on the loop. The human shifts from executor to approver — Human-on-the-Loop. Source · Platform Engineering 2.0 — Living Architecture, platformengineering.org (2026)
  21. EMBEDDED FINOPS · THE MECHANISM Cost becomes a policy decision

    Three control points turn spend from a month-end surprise into a guardrail enforced before anything runs. 01 Meter Every LLM call routes through a model gateway — tokens counted, team-labelled, streamed to BigQuery for showback. LiteLLM / Apigee BigQuery chargeback → 02 Quota GPU capacity is bounded by Kueue ResourceFlavors & quotas — estimated cost shows up right in the PR. Kueue quota MIG · spot · scale-to-zero → 03 Gate Pre-deploy cost gate is an admission controller — an over-budget manifest is rejected before it reaches the cluster. OPA Gatekeeper / Kyverno Cost stops being a month-end reconciliation — it's a policy decision at admission time. Source · Platform Engineering 2.0 FinOps · Kueue, OPA Gatekeeper, Kyverno, LiteLLM (2026)
  22. SECURITY SHIFTS DOWN · DEFENSE IN DEPTH Four gates contain

    a probabilistic system New threats — shadow AI, prompt injection, model poisoning, agent exfiltration — outrun shift-left checklists. The platform answers with four gates, invisible to the developer. 01 Identity Non-human workload identity per agent — short-lived mTLS certs. No more shared service accounts. SPIFFE / SPIRE 02 Gateway Every in/outbound message filtered — prompt-injection & DLP caught at the edge of the agent. Agent Gateway · Model Armor 03 Authorization Tools are allow-listed — an agent can only call what's been registered, nothing else. MCP Gateway · Agent Registry 04 Isolation Agent-run code executes in a kernel-isolated sandbox — a compromised agent can't break out. gVisor sandbox We don't trust the AI to behave — we use deterministic infrastructure to contain a probabilistic system. Source · SPIFFE/SPIRE · Model Armor · MCP · gVisor · Platform Engineering 2.0 (2026)
  23. THE NEW PARADIGM Two forces, one paradigm Platform for AI

    frees the data scientist's productivity ⇄ AI for Platform expands the ops team's leverage Under Platform Engineering 2.0, the two reinforce each other.
  24. KEY TAKEAWAYS Three moves 01 A philosophy shift, not a

    bolt-on LLM Don't discard existing investment — upgrade it into an AI-era foundation: V1 → V2 → V3. 02 Stand up the Agent & Knowledge planes MCP gateway, knowledge graph, tool registry, agent identity & governance. 03 From passive monitoring to active guardrails Shift security & FinOps down — deterministic policy-as-code at the foundation.
  25. APPENDIX References & further reading Every source cited across the

    talk, in one place. All links open in a new tab. REPORTS & PRIMARY SOURCES Platform Engineering 2.0: An evolution for the AI era (2026) platformengineering.org ↗ Reference Architecture for an AI/ML Internal Developer Platform on GCP (2026) platformengineering.org ↗ State of Platform Engineering, Vol. 4 — Weave Intelligence (2025) platformengineering.org ↗ Gemini Enterprise Agent Platform overview — Google Cloud docs cloud.google.com ↗ 2026 State of Software Delivery — CircleCI circleci.com/resources ↗ The new bottleneck — Stack Overflow Blog (2026) stackoverflow.blog ↗ TOOLS & PLATFORMS CITED Backstage backstage.io ↗ Terraform terraform.io ↗ OPA / Gatekeeper openpolicyagent.org ↗ Kyverno kyverno.io ↗ Model Context Protocol modelcontextprotocol.io ↗ Kueue kueue.sigs.k8s.io ↗ LiteLLM litellm.ai ↗ SPIFFE / SPIRE spiffe.io ↗ Model Armor cloud.google.com ↗ gVisor gvisor.dev ↗ Langfuse langfuse.com ↗ Arize arize.com ↗ Splunk splunk.com ↗