figure out where to run your pods. • CPU and memory are collectively referred to as compute resources. Compute resources are measurable quantities that can be requested, allocated, and consumed. • CPU is specified in units of millicores • Memory is specified in units of bytes. 5 https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-types
Can be taken away very quickly • “Merely” cause slowness when revoked • e.g. CPU,disk time • Incompressible resource • Hold state • Are slower to be taken away • Can fail to be revoked • e.g. Memory,disk space https://www.slideshare.net/damianigbe/kubernetes-scheduling-and-qos 6
cpu units • One CPU, in Kubernetes is equivalent to • 1 AWS vCPU • 1 GCP Core • 1 Azure vCore • 1 IBM vCPU • 1 Hyperthread on a bare-metal Intel processor with Hyperthreading • Unit Form: the form 100m might be preferred. • CPU is considered a “compressible” resource. 7 https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-types
bytes. • Unit Form: • integer or as a fixed-point integer using one of these suffixes: E, P, T, G, M, K. • the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. • Memory is considered a “incompressible” resource. 8 https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#resource-types
Pod CPU and Memory Resources https://schd.ws/hosted_files/kccnceu18/33/Inside%20Kubernetes%20QoS%20M.%20Gasch%20KubeCon%20E U%20FINAL.pdf 11 ESXi (Host) OS (Linux Kernel) Kubernetes (Pod Manifest) CPU Requests CPU Limits CPU Shares CPU Quota CPU Period CPU Shares CPU Reservation CPU Limit MEM Requests MEM Limits OOM Score Adj. MEM Limits MEM Shares MEM Reservation MEM Limit
requests are scheduled? • When you create a Pod, the Kubernetes scheduler selects a node for the Pod to run on. Each node has a maximum capacity for each of the resource types: the amount of CPU and memory it can provide for Pods. ➢How Pods with resource limits are run? • When the kubelet starts a Container of a Pod, it passes the CPU and memory limits to the container runtime. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource- requests-are-scheduled 13
cannot set requests that are larger than resources provided by your nodes. For example, if you have a cluster of dual-core machines, a Pod with a request of 2.5 cores will never be scheduled! • Pod: Each container in the Pod can set its own requests and limits, and these are all additive. 14 https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-resource-requests-and-limits
L Pod Best Effort ( 1Container) 0=R=L (all Containers) R L R L Pod Guaranteed ( 1Container) 0<R=L (all Containers) R L R L R L R L Pod (2 Containers) Burstable 0<R<=(L) (at least one Container) QoS Examples • Classes calculated based on CPU and Memory Resource Specifications (Requests/Limits)
to be given a QoS class of Guaranteed: • Every Container in the Pod must have a memory limit and a memory request, and they must be the same. • Every Container in the Pod must have a CPU limit and a CPU request, and they must be the same 18 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-guaranteed
own memory limit, but does not specify a memory request, Kubernetes automatically assigns a memory request that matches the limit • If a Container specifies its own CPU limit, but does not specify a CPU request, Kubernetes automatically assigns a CPU request that matches the limit. 20 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-guaranteed
given a QoS class of Burstable if: • The Pod does not meet the criteria for QoS class Guaranteed. • At least one Container in the Pod has a memory or CPU request. 21 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-burstable
to be given a QoS class of BestEffort • the Containers in the Pod must not have any memory or CPU limits or requests. 23 https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a- qos-class-of-besteffort
classes limits requests O O O X Guaranteed X O Burstable X X Best Effort Resource types(value) QoS classes limits requests limits=requests Guaranteed limits>requests Burstable limits<requests Burstable • QoS classes Soucre Code: kubernetes/pkg/apis/core/v1/helper/qos/qos.go https://github.com/kubernetes/kubernetes/blob/5713c22eecff461 0026643fbd3d37c33a43c168d/pkg/apis/core/v1/helper/qos/qos.go
when available compute resources are low. This is especially important when dealing with incompressible compute resources, such as memory or disk space. 28 Eviction Signal memory.available nodefs.available nodefs.inodesFree imagefs.available imagefs.inodesFree Default hard eviction threshold memory.available<100Mi nodefs.available<10% nodefs.inodesFree<5% imagefs.available<15% https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#eviction-policy
eviction threshold with a required administrator-specified grace period. No action is taken by the kubelet to reclaim resources associated with the eviction signal until that grace period has been exceeded. • soft eviction thresholds flags are supported: ◆eviction-soft ◆eviction-soft-grace-period ◆eviction-max-pod-grace-period 29 https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#soft-eviction-thresholds
grace period, and if observed, the kubelet will take immediate action to reclaim the associated starved resource. If a hard eviction threshold is met, the kubelet kills the Pod immediately with no graceful termination. • hard eviction thresholds flags are supported: ◆eviction-hard 30 https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#hard-eviction-thresholds
reclaim sufficient resource on the node, kubelet begins evicting Pods. • kubelet ranks and evicts Pods in the following order: 1. BestEffort 2. Burstable 3. Guaranteed 31 https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#evicting-end-user-pods
for each container based on the quality of service for the Pod. https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior 32 Quality of Service oom_score_adj Guaranteed -998 BestEffort 1000 Burstable min(max(2, 1000 - (1000 * memoryRequestBytes) / machineMemoryCapacityBytes), 999)
right away • The kubelet currently polls cAdvisor to collect memory usage stats at a regular interval. If memory usage increases within that window rapidly, the kubelet may not observe MemoryPressure fast enough, and the OOMKiller will still be invoked. • viable workaround :set eviction thresholds at approximately 75% capacity 33
Master Components) • It is never desired for kubelet to evict a DaemonSet Pod, since the Pod is immediately recreated and rescheduled back to the same node. • Instead DaemonSet should ideally launch Guaranteed Pods. 36 https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#daemonset
Lessons from the Field - Michael Gasch • https://www.youtube.com/watch?v=8-apJyr2gi0 • https://schd.ws/hosted_files/kccnceu18/33/Inside %20Kubernetes%20QoS%20M.%20Gasch%20Kube Con%20EU%20FINAL.pdf 38