Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevTalks Romania - The Self-Healing Canary: Int...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Kevin Dubois Kevin Dubois
June 04, 2026
49

DevTalks Romania - The Self-Healing Canary: Integrating Agentic AIOps into Your Releases

Avatar for Kevin Dubois

Kevin Dubois

June 04, 2026

Transcript

  1. Kevin Dubois ★ Sr. Principal Developer Advocate at ★ Java

    Champion ★ Technical Lead, CNCF DevEx TAG ★ From Belgium 󰎐 / Live in Switzerland󰎤 ★ 🗣 English, Dutch, French, Italian youtube.com/@thekevindubois github.com/kdubois @kevindubois.com linkedin.com/in/kevindubois
  2. @kevindubois CI / CD Build Test Security Checks Release Deploy

    Stage Deploy Prod Continuous Integration Continuous Delivery Manual
  3. @kevindubois CI - CD - CD Build Test Security Checks

    Release Deploy Stage Deploy Prod Continuous Integration Continuous Delivery Continuous Deployment Manual Auto
  4. @kevindubois GitOps Application Delivery Model Push Pull Pull Request Source

    Git Repository Image Registry Config Git Repository Kubernetes Deploy Monitor Detect drift CD Take action
  5. @kevindubois Metrics Based Rollouts strategy: canary: analysis: args: - name:

    service-name value: rollouts-demo-canary.canary.svc.cluster.local templates: - templateName: success-rate canaryService: rollouts-demo-canary stableService: rollouts-demo-stable trafficRouting: istio: virtualService: name: rollout-vsvc routes: - primary steps: - setWeight: 30 - pause: { duration: 20s } - setWeight: 40 - pause: { duration: 10s } - setWeight: 60 - pause: { duration: 10s } - setWeight: 80 - pause: { duration: 5s } - setWeight: 90 - pause: { duration: 5s } - setWeight: 100 - pause: { duration: 5s }
  6. @kevindubois apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: name: success-rate spec: args:

    - name: service-name metrics: - name: success-rate interval: 10s successCondition: len(result) == 0 || result[0] >= 0.95 failureLimit: 2 provider: prometheus: address: https://internal:[email protected] .local:9090 query: | sum(irate(istio_requests_total{ reporter="source", destination_service=~"{{args.service-name}}", response_code!~"5.*"}[30s]) )
  7. @kevindubois apiVersion: argoproj.io/v1alpha1 kind: RolloutManager metadata: name: argo-rollouts spec: plugins:

    metric: - name: argoproj-labs/metric-ai location: https://github.com/argoproj-labs/rollouts-plugin-metric-ai/releases/ download/v0.0.1/rollouts-plugin-metric-ai-linux-amd64
  8. @kevindubois apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: name: canary-analysis-ai-agent spec: metrics:

    - interval: 10s name: success-rate provider: plugin: argoproj-labs/metric-ai: agentUrl: http://kubernetes-agent:8080 stableLabel: role=stable canaryLabel: role=canary extraPrompt: ignore aesthetic changes successCondition: result > 0.50
  9. @kevindubois apiVersion: argoproj.io/v1alpha1 kind: AnalysisTemplate metadata: name: canary-analysis-ai-agent spec: metrics:

    - interval: 10s name: success-rate provider: plugin: argoproj-labs/metric-ai: agentUrl: http://kubernetes-agent:8080 stableLabel: role=stable canaryLabel: role=canary extraPrompt: ignore aesthetic changes githubUrl: …github.com/kdubois/argo-rollouts-quarkus-demo
  10. @kevindubois Lessons learned: Performance was an interesting challenge: went from

    one AI service to agentic system: parallel & async agents LLM choice + “context engineering” + tool calling especially for PR creation Complexity vs portability (e.g. could’ve used Serverless MCP, external code assistant for PR creation, distributed agents, etc.)
  11. @kevindubois Takeaways: Rolling out changes to all users at once

    is risky Canary rollouts and feature flags are safer AI Agents can automate the loop by analyzing metrics and logs, and even proposing fixes for the failures AI != Python Java with Quarkus LangChain4j is super powerful for real, production-grade agentic AI systems.