Kubernetes Overview

Google confidential │ Do not distribute Kubernetes & cAdvisor Docker
meetup - Bangalore Vishnu Kannan ([email protected]) Software Enginner, Google Inc. Github, IRC: vishh

Google confidential │ Do not distribute Google has been developing
and using containers to manage our applications for over 10 years. Images by Connie Zhou

Google confidential │ Do not distribute Traditional computing • Server
per component • Configuration & Management • Plan ahead • Op Ex • Scalability limits. • Utilization libs app kernel libs app kernel libs app kernel libs app kernel libs app kernel libs app kernel

Google confidential │ Do not distribute Cluster Computing Solution: Use
Computers! • Automation • Scalability • Think about resources • Diverse workloads • Ease of management Omega, Mesos, Kubernetes, etc. libs app kernel libs app kernel libs app kernel libs app kernel libs app kernel libs app kernel

Google confidential │ Do not distribute Typical parts of a
Cluster Management System • Scheduler • Node manager • Binary deployment service • Application discovery • Application config management • Node and application monitoring

Google confidential │ Do not distribute One application per machine?
Can we do better? 1. Place multiple application on one machine 2. Partition the physical machine - VMs 3. Partition the resources on a physical machine - cgroups, namespaces (isolation) Smarter Node Management Capacity vs Usage

Google confidential │ Do not distribute Old Way: Shared Machines
No isolation No namespacing Common libs Highly coupled apps and OS app kernel libs app app app

Google confidential │ Do not distribute Old Way: Virtual Machines
Some isolation Expensive and inefficient Still highly coupled to the OS Hard to manage libs app kernel libs app app kernel app

Google confidential │ Do not distribute New Way: Containers libs
app kernel libs app libs app libs app Think of Lightweight VMs Isolate CPU, RAM, Disk, Users, Network, etc. Powered by Linux APIs • cgroups • namespaces • capabilities • chroots Better resource utilization.

Google confidential │ Do not distribute cAdvisor Understand resource usage
and performance of applications Google OSS project Written in Go; tiny resource footprint. Supports Docker containers natively. Lxc and raw cgroup supported. Understands Cpu, memory, filesystem and network utilization Easy to use REST Api Runs in a docker container

Google confidential │ Do not distribute Heapster Cluster container monitoring
using cAdvisor Default monitoring solution in kubernetes Filesystem based API to support other Cluster mangement systems. CoreOS support using Filesystem API Discovers and collects stats from cAdvisors running on all the nodes Pushes data to InfluxDB or BigQuery Typical setup: Heapster + InfluxDB + Grafana

Google confidential │ Do not distribute Kubernetes Greek for “Helmsman”;
also the root of the word “Governor” • Container orchestrator • Runs Docker containers • Supports multiple cloud and bare- metal environments • Inspired and informed by Google’s experiences • Open source, written in Go Manage applications, not machines

Google confidential │ Do not distribute

Google confidential │ Do not distribute High Level Design CLI
API UI apiserver users master kubelet kubelet kubelet nodes scheduler

Google confidential │ Do not distribute Primary Concepts Container: A
sealed application package (Docker) Pod: A small group of tightly coupled Containers example: content syncer & web server Controller: A loop that drives current state towards desired state example: replication controller Service: A set of running pods that work together example: load-balanced backends Labels: Identifying metadata attached to other objects example: phase=canary vs. phase=prod Selector: A query against labels, producing a set result example: all pods where label phase == prod

Google confidential │ Do not distribute Design Principles Declarative >
imperative: State your desired results, let the system actuate Control loops: Observe, rectify, repeat Simple > Complex: Try to do as little as possible Modularity: Components, interfaces, & plugins Legacy compatible: Requiring apps to change is a non-starter Network-centric: IP addresses are cheap No grouping: Labels are the only groups Cattle > Pets: Manage your workload in bulk Open > Closed: Open Source, standards, REST, JSON, etc.

Google confidential │ Do not distribute Pets vs. Cattle

Google confidential │ Do not distribute Control Loops Drive current
state -> desired state Act independently APIs - no shortcuts or back doors Observed state is truth Recurring pattern in the system Example: ReplicationController observe diff act

Google confidential │ Do not distribute Atomic Storage Backing store
for all master state Hidden behind an abstract interface Stateless means scalable Watchable • this is a fundamental primitive • don’t poll, watch Using CoreOS etcd

Google confidential │ Do not distribute Pods Small group of
containers & volumes Tightly coupled Scheduling atom Shared namespace • share IP address & localhost Ephemeral • can die and be replaced Example: data puller & web server Pod File Puller Web Server Volume Consumers Content Manager

Google confidential │ Do not distribute Pod Networking Pod IPs
are routable • Docker default is private IP Pods can reach each other without NAT • even across nodes Pods can egress traffic • if allowed by cloud environment No brokering of port numbers Fundamental requirement • several SDN solutions

Google confidential │ Do not distribute Volumes Pod scoped Share
pod’s lifetime & fate Support various types of volumes • Empty directory (default) • Host file/directory • Git repository • GCE Persistent Disk • ...more to come, suggestions welcome Pod Container Container Git GitHub Host Host’s FS GCE GCE PD Empty

Google confidential │ Do not distribute Pod Lifecycle Once scheduled
to a node, pods do not move • restart policy means restart in-place Pods can be observed pending, running, succeeded, or failed • failed is really the end - no more restarts • no complex state machine logic Pods are not rescheduled by the scheduler or apiserver • even if a node dies • controllers are responsible for this • keeps the scheduler simple

Google confidential │ Do not distribute Labels Arbitrary metadata Attached
to any API object Generally represent identity Queryable by selectors • think SQL ‘select ... where ...’ The only grouping mechanism • pods under a ReplicationController • pods in a Service • capabilities of a node (constraints) Example: “phase: canary” App: Nifty Phase: Dev Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: FE App: Nifty Phase: Test Role: BE

Google confidential │ Do not distribute Selectors App: Nifty Phase:
Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE

Google confidential │ Do not distribute App == Nifty App:
Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Google confidential │ Do not distribute App == Nifty Role
== FE App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Google confidential │ Do not distribute App == Nifty Role
== BE App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Google confidential │ Do not distribute App == Nifty Phase
== Dev App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Google confidential │ Do not distribute App == Nifty Phase
== Test App: Nifty Phase: Dev Role: FE App: Nifty Phase: Test Role: FE App: Nifty Phase: Dev Role: BE App: Nifty Phase: Test Role: BE Selectors

Google confidential │ Do not distribute Replication Controllers Canonical example
of control loops Runs out-of-process wrt API server Have 1 job: ensure N copies of a pod • if too few, start new ones • if too many, kill some • group == selector Cleanly layered on top of the core • all access is by public APIs Replication Controller - Name = “nifty-rc” - Selector = {“App”: “Nifty”} - PodTemplate = { ... } - NumReplicas = 4 API Server How many? 3 Start 1 more OK How many? 4

Google confidential │ Do not distribute Services A group of
pods that act as one • group == selector Defines access policy • only “load balanced” for now Gets a stable virtual IP and port • called the service portal • soon to have DNS VIP is captured by kube-proxy • watches the service constituency • updates when backends change Hide complexity - ideal for non-native apps Portal (VIP) Client

Google confidential │ Do not distribute Cluster Services Logging, Monitoring,
DNS, etc. All run as pods in the cluster - no special treatment, no back doors Open-source solutions for everything • cadvisor + influxdb + heapster == cluster monitoring • fluentd + elasticsearch + kibana == cluster logging • skydns + kube2sky == cluster DNS Can be easily replaced by custom solutions • Modular clusters to fit your needs

Google confidential │ Do not distribute Status & Plans Open
sourced in June, 2014 Google just launched Google Container Engine (GKE) • hosted Kubernetes • https://cloud.google.com/container-engine/ Roadmap: • https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/roadmap.md Driving towards a 1.0 release in O(months)

Google confidential │ Do not distribute The Goal: Shake Things
Up Containers is a new way of working Requires new concepts and new tools Google has a lot of experience... ...but we are listening to the users Workload portability is important!

Google confidential │ Do not distribute cAdvisor & Kubernetes is
Open Source We want your help! http://kubernetes.io https://github.com/google/cadvisor https://github.com/GoogleCloudPlatform/heapster irc.freenode.net #google-containers

Google confidential │ Do not distribute Backup slides

Google confidential │ Do not distribute Why containers? • Performance
• Repeatability • Isolation • Quality of service • Accounting • Visibility • Portability A fundamentally different way of managing applications Images by Connie Zhou

Google confidential │ Do not distribute cAdvisor Internals Docker Kernel
cAdvisor • Collect • Measure • Analyze • Export Users LXC lmctfy

Google confidential │ Do not distribute Docker Dramatically simplifies node
management. Easy to use Build, test and deploy - anywhere Provides resource isolation and security Big ecosystem exists around Docker WIP - better resource isolation, hardening, performance, etc.

Google confidential │ Do not distribute cAdvisor roadmap • Better
signals and more resources • Memory • Disk I/O • Network • More suggestions • Insufficient resources • Performance effects • Start applying suggestions

Google confidential │ Do not distribute Heapster roadmap • Auto
scaling • Nodes • Containers • Recognize Antagonists • Bad interactions between containers • Current work: CPI2 • React to signals • Migrate containers (CRUI)

Google confidential │ Do not distribute 10.1.1.0/24 10.1.1.93 10.1.1.113 Docker
Networking 10.1.2.0/24 10.1.2.118 10.1.3.0/24 10.1.3.129

Google confidential │ Do not distribute 10.1.1.0/24 10.1.1.93 10.1.1.113 Docker
Networking 10.1.2.0/24 10.1.2.118 10.1.3.0/24 10.1.3.129 NAT NAT NAT NAT NAT

Google confidential │ Do not distribute 10.1.1.0/24 10.1.1.93 10.1.1.113 Pod
Networking 10.1.2.0/24 10.1.2.118 10.1.3.0/24 10.1.3.129

Google confidential │ Do not distribute Replication Controllers node 1
f0118 node 3 node 4 node 2 d9376 b0111 a1209 Replication Controller - Desired = 4 - Current = 4

f0118 node 3 node 4 node 2 Replication Controller - Desired = 4 - Current = 3 d9376 b0111 a1209

f0118 node 3 node 4 node 2 Replication Controller - Desired = 4 - Current = 4 d9376 b0111 a1209 c9bad

Google confidential │ Do not distribute Services 10.0.0.1 : 9376
Client kube-proxy Service - Name = “nifty-svc” - Selector = {“App”: “Nifty”} - Port = 9376 - ContainerPort = 8080 Portal IP is assigned iptables DNAT TCP / UDP apiserver watch 10.240.2.2 : 8080 10.240.1.1 : 8080 10.240.3.3 : 8080 TCP / UDP

Kubernetes Overview

Kubernetes Overview

More Decks by Vish Kannan

Other Decks in Technology

Featured

Transcript