Launches and manages containers on clusters ‣ Scaling and fault recovery of containers ‣ Service discovery and load balancing Examples include ‣ Kubernetes (K8s): Cloud Native Computing Foundation ‣ Docker Swarm: Docker ‣ Nomad: Hashicorp ‣ Marathon: Mesosphere 2
large- scale clusters. Requests must be rerouted appropriately in case of a failure. Traffic Surge Deploying Updates Incoming traffic often fluctuates violently. Server resources must be quickly scaled out to meet the demand. Modern web services are deployed hundreds or thousand times a day. Updates must be applied without stopping the whole service.
Originally designed by Google, donated to the Cloud Native Computing Foundation ‣ Scales up to 5,000 nodes and 300,000 containers ‣ Used in production at: 7 and many more Bloomberg “The Tale of Kubernetes”
changes in the cluster resources Compute diff between actual cluster state and desired cluster state stored in DB Reconcile diff by launching containers, killing containers, etc.
to “what” ‣ From imperative to declarative ‣ User requests the desired state, whereas the orchestrator executes appropriate procedures to reach the desired state Realized the shift from “fortress” to “weeble” ‣ Embrace mortality and aim for dynamic equilibrium ‣ As long as the cluster can provide services to users as a collective system, the life/death of individual nodes or containers doesn’t matter 11
cluster. ‣ It enables faster scaling, fault recovery, and deployment. ‣ It is changing cluster management from “how” to “what” ‣ It is bringing high resiliency to production clusters. 13 https://www.cafereo.co.jp/goods/124082
cluster management at Google with Borg (Paper) ‣ https://research.google.com/pubs/pub43438.html Scalable Microservices with Kubernetes (MOOC) ‣ https://www.udacity.com/course/scalable-microservices-with- kubernetes--ud615 Whack-a-pod: The Kubernetes cluster whack-a-mole game ‣ https://medium.com/google-cloud/whack-a-pod-359cbfb61662 14