Frequent Updates and Stability Hiroshi Hayakawa | @hhiroshell Cloud Platform Developer & 12th Black Belt Engineer of CloudNative Stack, Yahoo Japan Corporation
for a variety of reasons – Frequent updates – Automatic / Manual scaling – Maintenance of infrastructure • We should consider restarts as something that can happen at any time • Need to be stable even with frequent restarts 3
issue, let's break it down into three perspectives 6 v1 Pods v2 Pods v3 Pods Ensure that the pod starts safely with sufficient preparation Shut it down truly Graceful Eliminate unexpected crashes
is Ready” means user requests will come into the pod • Containers in the pod must be ready to receive actual requests before the pod’s status become “Ready.” 8
depends on the application 😅 • Examples: – Prepare connections of relational database – Allocate sufficient initial heap memory – Warm up the app and optimize it with JIT compilation 9
time InitContainers run sequentially Containers starts and runs ENTRYPOINT command startup probe readiness probe liveness probe … … … It also runs postStart lifecycle hook • On control plane nodes • A new pod is persisted to the API Server • The scheduler decides which Node to place the pod • On worker nodes • The kubelet on the Node starts to make an environment for containers in the pod Pre-start process Containers start up (Our Concern)
time InitContainers run sequentially Containers starts and runs ENTRYPOINT command startup probe readiness probe liveness probe … … … It also runs postStart lifecycle hook • On control plane nodes • A new pod is persisted to the API Server • The scheduler decides which Node to place the pod • On worker nodes • The kubelet on the Node starts to make an environment for containers in the pod Pre-start process Containers start up (Our Concern) Service In Pod becomes “READY.” Requests come into the pod. Make containers ready here,
• initContainers – Runs as separate containers from main containers – Performs pre-processing before main containers start (e.g., dynamically create configuration files for the main containers) • postStart lifecycle hook – Runs different commands in a main container than the startup command – Pod will not be ready until postStart lifecycle hook is finished – Can be used to prepare main containers before these accept requests (e.g., send warm up requests to the main container with curl) 12
hook apiVersion: v1 kind: Pod metadata: name: java-app spec: containers: - name: java-app image: java-app-image:v1.0.0 ...(snip)... ports: - containerPort: 8080 name: http readinessProbe: failureThreshold: 3 httpGet: path: /health/readiness port: http periodSeconds: 1 lifecycle: postStart: exec: command: - sh - -c - |- sleep 10 for i in $(seq 10000); do curl -s http://localhost:8080/ ∖ > /dev/null done preStop: ...(snip)... ✓Wait for the java-app to start. ✓Send requests to the app in the same pod.
for various reasons • If pods don’t stop safely, incoming requests will end with a failure • When a pod stops due to heavy load, other pods are overloaded and stop one after another 15
Usage High CPU Usage Bottlenecks outside the pod Full GC Can‘t respond to liveness probe The kubelet terminates the pod OOM Kill * If the liveness probe depends on outside the pod
Full GC * If the liveness probe depends on outside the pod High Traffic High Memory Usage High CPU Usage Can‘t respond to liveness probe The kubelet terminates the pod OOM Kill Enough Load testing and Tuning!
can set “requests” and “limits” for each container in a Pod – .spec.containers[].resources.requests.[cpu|memory]: • quantity of resources guaranteed to be available to the container – .spec.containers[].resources.limits.[cpu|memory]: • limits for resource consumption beyond reqeusts 18 resources.limits resources.requests Actual resource consumption of the container
when a container is going to consume more than resource.limits – CPU: Throttling – Memory: OOM Kill • If an OOM Kill occurs, the application cannot handle OS signals and cannot shutdown gracefully 19
consumption consists of multiple areas such as heap memory, thread stack, etc. • Tune JVM parameters to keep memory usage within resource limit 20 resource limit Total memory consumption of JVM = Heap + Thread Stack + Code Cache + Metaspace + Direct Memory + Native Memory resource request 🤯 Heap other areas
in the JVM, the maximum heap memory is 20-30% of the resource limit. It often means that the resource requests/limits are left unused. Max size of heap memory 21 • You can write, for example, “-XX:MaxRAMPercentage=50.0” in JVM flags to specify heap memory as a percentage of the resource limit resources.limits resources.requests Heap other areas
outside the pod OOM Kill * If the liveness probe depends on outside the pod High Traffic High Memory Usage Full GC Can‘t respond to liveness probe The kubelet terminates the pod Select an appropriate GC algorithm
the container world, we often use Serial GC due to the small quantity of resources allocated • Serial GC takes a long time for Full GC, During Full GC, the application cannot respond to livenessProbe, and it increases the risk that pods restart unnecessarily 23 Java app container http livenessProbe request fails TERMINATE kubelet Stop the world (Full GC)
using concurrent GC algorithms (e.g., G1GC) with two or more CPU cores • More heap memory will increase the time of full GC. Even just increasing the memory should not be done without testing 24
OOM Kill * If the liveness probe depends on outside the pod High Traffic High Memory Usage High CPU Usage Full GC Can‘t respond to liveness probe The kubelet terminates the pod Make livenessProbe stable
26 The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. “ “ -- Kubernetes documentation
– A livenesspProbe that becomes unresponsive under high load will cause a pod to restart when you want it to be running – Sometimes default (it judges success if the application process is running) is still better 27 Java app container http livenessProbe request fails too many requests (and some of them fails) TERMINATE kubelet users
CPU Usage Full GC OOM Kill High Traffic Bottlenecks outside the pod Can‘t respond to liveness probe The kubelet terminates the pod * If the liveness probe depends on outside the pod Make the livenessProbe depends only on inside the pod
outside of the pod – Situations outside of the Pod cause the Pod to be terminated – If the cause is not in the Pod, restarting the Pod will not make the situation better 29
the shutdown process of a pod – kubelet terminates containers – kube-proxy stops traffic to the pod • If containers start to be terminated before traffic stops, requests will result in errors 31
preStop lifecycle hook starts kube-proxy stops traffic to the pod deletionGracePeriodSeconds (default: 30s) kubelet sends SIGKILL to the container Some kube-apiserver client adds a deletionTimestamp annotation to the pod kubelet sends SIGTERM to the container Service Out There is no incoming traffic to the pod from here.
preStop lifecycle hook starts kube-proxy stops traffic to the pod deletionGracePeriodSeconds (default: 30s) kubelet sends SIGKILL to the container Some kube-apiserver client adds a deletionTimestamp annotation to the pod kubelet sends SIGTERM to the container Service Out There is no incoming traffic to the pod from here. Handle SIGTERM and shutdown gracefully Wait for service out
graceful shutdown? • Wait for service out – kubelet sends a SIGTERM OS signal to the container after preStop lifecycle hook exits – Wait until kube-proxy stops traffic to the pod by running the ”sleep xx” command in the preStop hook • Handle SIGTERM signal in the application and shutdown gracefully – The application returns all responses to in-flight requests before it exits – In many languages, SDKs or frameworks supports it 34
can enable SIGTERM Handling by simply adding some properties to the application.properties 36 server.shutdown=graceful spring.lifecycle.timeout-per-shutdown-phase=15s