Drift Happens! 3 Kubernetes Drift Scenarios & How to Overcome Them

Drift Happens! Kubernetes Drift Scenarios & How to Overcome Them
Tuesday April 22nd, 2025

Housekeeping & Introductions Why Drift Happens The Impact of Drift
on Actual Environments Best Practices and Strategies Q and A Agenda

Housekeeping • Yes this webinar is recorded • Use the
Q+A section to ask questions • ~45 minutes

Meet our speakers 👋 Ilan Adler Komodor PMM Chen Kubani
Product Manager

A Quick Poll! • How does your team primarily detect
potential configuration drift in Kubernetes today?

A Quick Poll! A) Manual Checks – comparing manifests, kubectl
diff, regular reviews. B) Reactively – usually only discovered when investigating an incident or failure. C) Using built-in features of GitOps tools (like Argo CD, Flux). D) We don't have a specific or consistent process for detecting drift.

“We can’t track who changed what across our clusters” “Configuration
drift between clusters is a constant problem” “Our GitOps workflow breaks down when changes that meant for DEV, ended up in PROD” Common Drift Concerns

K8s Estate Increases • More clusters, more services - more
issues and headaches Manual Changes & Control • Break glass mechanisms are important but can be debilitating Deployment Issues • Large scale and complex Kubernetes environments can suffer from inconsistent deployments “drifting” from baseline configurations Why Does Drift Happen???

Tales of the Drift

01 Configuration Drift Across Environments Inconsistent Behavior in a Service
A service deployed across two regions: Prod EU and Prod US, runs smoothly in EU. The Culprit - Inconsistent Memory Limits Due to a misconfiguration during deployment The Cost - 1 Hour of Troubleshooting Took the team an hour to identify the issue at hand.

02 Managing a Large K8s Fleet Degraded Cluster Performance Managing
hundreds of services across multiple clusters. The Culprit - Outdated Container Image An incomplete deployment process left the cluster with an outdated image. The Cost - 4 Hours of Analysis Multiple team members spent hours trying to detect the root cause of performance issues.

03 GitOps Workflow Service Reliability Issues Pod Crashes for a
Critical Service Started with a new feature rollout The Culprit - Liveness Probes Incorrectly Configured The Cost - 1 Full Day to Recover A container image with non-prod configurations was deployed due to GitOps workflows Took the developer and escalated SRE engineer to identify and remediate

Understanding the Full Impact of Drift Performance and Stability Issues
• Degraded service performance • Increased failure rates and downtime • Longer troubleshooting time due to hard-to-detect configuration discrepancies  Security Issues • Vulnerabilities from outdated or misconfigured services  Cost and Inefficiency Issues • Services running misaligned configurations can impact cloud costs

Recommendations and Techniques Use policies and automation to limit risky
manual changes and enforce best practices. Set Guardrails where Possible Use Git as the single source of truth for configurations. GitOps ensures visibility, consistency, and accountability across environments.  Move towards GitOps Proactively catch misconfigurations with automated alerts and self-healing mechanisms to reduce MTTR.  Automate Everything Drift happens — your ability to detect and react defines your resilience. Here are key strategies to proactively manage and reduce the risk of drift: Treat drift checks as a default part of incident response — it can dramatically speed up root cause identification. Integrate Drift into Troubleshooting

Immediately identify root cause, and quickly resolve it. Intuitive and
user friendly view with detailed insights. Compare versions and resources on your Helm charts. Winning the Battle Against K8s Drift! Easy to Use Visual Experience Easily edit the desired state, and enforce best practices with all resources types. Diff only mode for changes in multiple services. Accelerate Troubleshooting & Recovery Detect Discrepancies Keep service configurations uniform across complex K8s environments. Flag deviations as reliability risks and standardize configs across the fleet. Automate Drift Detection Automatically detect and remediate. Connect to GitOps tooling to maintain a consistent source of truth.

Demo Time

Questions?

Thank You

Drift Happens! 3 Kubernetes Drift Scenarios & H...

Drift Happens! 3 Kubernetes Drift Scenarios & How to Overcome Them

Komodor

More Decks by Komodor

Other Decks in Technology

Featured

Transcript

Drift Happens! Kubernetes Drift Scenarios & How to Overcome Them

Housekeeping & Introductions Why Drift Happens The Impact of Drift

Housekeeping • Yes this webinar is recorded • Use the

Meet our speakers 👋 Ilan Adler Komodor PMM Chen Kubani

A Quick Poll! • How does your team primarily detect

A Quick Poll! A) Manual Checks – comparing manifests, kubectl

“We can’t track who changed what across our clusters” “Configuration

K8s Estate Increases • More clusters, more services - more

Tales of the Drift

01 Configuration Drift Across Environments Inconsistent Behavior in a Service

02 Managing a Large K8s Fleet Degraded Cluster Performance Managing

03 GitOps Workflow Service Reliability Issues Pod Crashes for a

Understanding the Full Impact of Drift Performance and Stability Issues

Recommendations and Techniques Use policies and automation to limit risky

Immediately identify root cause, and quickly resolve it. Intuitive and

Demo Time

Questions?

Thank You

Drift Happens! 3 Kubernetes Drift Scenarios & H...

Drift Happens! 3 Kubernetes Drift Scenarios & How to Overcome Them

More Decks by Komodor

Other Decks in Technology

Featured

Transcript

Drift Happens! 3 Kubernetes Drift Scenarios & H...

Drift Happens! 3 Kubernetes Drift Scenarios & How to Overcome Them