Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes monitoring 101

Kubernetes monitoring 101

In this talk, I describe some common issues in a Kubernetes cluster and what are the metrics you should monitor to troubleshoot.

Avatar for Sergio Moya

Sergio Moya

July 16, 2018
Tweet

More Decks by Sergio Moya

Other Decks in Programming

Transcript

  1. ©2008–18 New Relic, Inc. All rights reserved Kubernetes Monitoring 101

    Contain the Complexity of Kubernetes Sergio Moya - Senior Software Engineer @ New Relic
  2. ©2008–18 New Relic, Inc. All rights reserved • Why monitoring

    is a must. • What Needs to be Monitored in Kubernetes • Metric sources • How to monitor • Q&A Agenda
  3. ©2008–18 New Relic, Inc. All rights reserved Kubernetes Cluster Node

    Applications Pod/Deployments Containers 4 What Needs to be Monitored in Kubernetes? And more...
  4. ©2008–18 New Relic, Inc. All rights reserved • What is

    the size of my Kubernetes cluster? • How many nodes, namespaces, deployments, pods, containers do I have running in my Cluster? Cluster Admin Cluster
  5. ©2008–18 New Relic, Inc. All rights reserved dsc 6 Cluster

    MONITORING FOR: Cluster Overview • What is the size of my Kubernetes cluster? • How many nodes, namespaces, deployments, pods, containers do I have running in my Cluster? Cluster Admin WHAT • Snapshot of what objects are included in a Cluster WHY • Kubernetes is managed by various teams including SREs, SysAdmin, Developers so it can be difficult to keep track of the current state of a Cluster
  6. ©2008–18 New Relic, Inc. All rights reserved • Do we

    have enough nodes in our cluster? • Are the resource requirements for the deployed applications overbook with existing nodes? Node Operations
  7. ©2008–18 New Relic, Inc. All rights reserved dsc 8 Node

    MONITORING FOR: Node resource consumption WHAT • Resource consumption (Used cores, Used memory) for each Kubernetes node • Total Memory VS Used WHY • Ensure that your cluster remains healthy • Ensure new deployments will succeed and not be blocked by lack of resources • Do we have enough nodes in our cluster? • Are the resource requirements for the deployed applications overbook with existing nodes? Operations
  8. ©2008–18 New Relic, Inc. All rights reserved • Are things

    working the way I expect them to? • Are my apps running and healthy? Pods Operations
  9. ©2008–18 New Relic, Inc. All rights reserved dsc 10 MONITORING

    FOR: Pods not running WHY • Missing pods may indicate: ◦ Insufficient resources to schedule a pod ◦ Unhealthy pods: Liveness probe, readinessProbe, etc. ◦ Others • Are things working the way I expect them to? • Are my apps running and healthy? Operations Pods/ Deployment WHAT • Number of current pods in a Deployment should be the same as desired.
  10. ©2008–18 New Relic, Inc. All rights reserved • Are my

    containers hitting their resource limits and affecting application performance? • Are there spikes in resource consumption? • Are there any containers in a restart loop? • How many container restarts have there been in X amount of time? Containers DevOps
  11. ©2008–18 New Relic, Inc. All rights reserved dsc 12 MONITORING

    FOR: Container Resources Usage WHY • If a container hits the limit of CPU usage, the application’s performances will be affected • If a container hits the limit of memory usage, K8s might terminate it or restart it • Are my containers hitting their resource limits and affecting application performance? • Are there spikes in resource consumption? DevOps Containers WHAT • Resource Request: minimum amount of resource which will be guaranteed by the scheduler • Resource Limit: is the maximum amount of the resource that the container will be allowed to consume
  12. ©2008–18 New Relic, Inc. All rights reserved dsc 13 MONITORING

    FOR: Container Restarts WHY • In normal conditions, container restart should not happen • A restart indicates an issue either with the container itself or the underlying host • Are there any containers in a restart loop? • How many container restarts have there been in X amount of time? DevOps Containers WHAT • A container can be restarted when it crashes or when its memory usage reaches the limit defined
  13. ©2008–18 New Relic, Inc. All rights reserved • What and

    how many services does my cluster have? • Which is the current status of my Horizontal Pod Autoscalers? • Are my Persistent Volumes well provisioned? • Etc Others You
  14. ©2008–18 New Relic, Inc. All rights reserved Metric sources •

    Kubernetes API • kube-state-metrics • Heapster (deprecated) • Metrics Server • Kubelet and Cadvisor
  15. ©2008–18 New Relic, Inc. All rights reserved K8s API •

    No third party • Up to date • Bottleneck • Missing critical data. Ex: Pods resources Pros Cons
  16. ©2008–18 New Relic, Inc. All rights reserved kube-state-metrics • Tons

    of metrics • Well supported • Prometheus format • No data about not-scheduled-yet pods • Only state, no resources Pros Cons
  17. ©2008–18 New Relic, Inc. All rights reserved Heapster • Tons

    of metrics • Different backends (sinks) • Exposes Prometheus format • Plug&Play • No Prometheus backend (sink) • Resource consumption • Some sinks are not maintained • Deprecated (k8s >=v1.13.0) Pros Cons
  18. ©2008–18 New Relic, Inc. All rights reserved Metrics Server •

    Implements K8s Metrics API standard • Official • Only few metrics (CPU & Memory) • Early stage (incubator) Pros Cons
  19. ©2008–18 New Relic, Inc. All rights reserved Kubelet + Cadvisor

    • No third party • All data regarding the node, pods and containers resources • Distributed by nature • Only data about nodes, pods and containers • Some data inconsistency between the API and Kubelet Pros Cons
  20. ©2008–18 New Relic, Inc. All rights reserved K8s API Pros

    - No third party - Up to date - Bottleneck - Missing critical data. Ex: Pods resources Cons kube-state- metrics - Tons of metrics - Well supported - Prometheus format - No data about not-scheduled-yet pods - Only state, no resources Heapster - Tons of metrics - Different backends (sinks) - Exposes Prometheus format - Plug&Play - No Prometheus backend (sink) - Resource consumption - Some sinks are not maintained - Deprecated (k8s >=v1.13.0) Metrics Server - Implements K8s Metrics API standard - Official - Only few metrics (CPU & Memory) - Early stage (incubator) Kubelet + Cadvisor - No third party - All data regarding the node, pods and containers resources - Distributed by nature - Only data about nodes, pods and containers - Some data inconsistency between the API and Kubelet
  21. ©2008–18 New Relic, Inc. All rights reserved Custom solutions •

    Deployment of pods fetching metrics from any of the sources. • Daemonset fetching metrics the Kubelet + Cadvisor (node) • Combination of both • Others?
  22. ©2008–18 New Relic, Inc. All rights reserved How New Relic

    Kubernetes integration works under the hood? This topic: another talk