Google as an open source container orchestration platform. • Built from the lessons learned in the experiences of developing and running Google’s Borg and Omega. • Designed from the ground-up as a loosely coupled collection of components centered around deploying, maintaining and scaling workloads.
of distributed systems. • Abstracts away the underlying hardware of the nodes and provides a uniform interface for workloads to be both deployed and consume the shared pool of resources. • Works as an engine for resolving state by converging actual and the desired state of the system.
to its desired state. • Me: “I want 3 healthy instances of redis to always be running.” • Kubernetes: “Okay, I’ll ensure there are always 3 instances up and running.” • Kubernetes: “Oh look, one has died. I’m going to attempt to spin up a new one.”
Deployments • Fire off jobs and scheduled cronjobs • Manage Stateless and Stateful Applications • Provide native methods of service discovery • Easily integrate and support 3rd party apps
kubernetes control plane and datastore. • All clients and other applications interact with kubernetes strictly through the API Server. • Acts as the gatekeeper to the cluster by handling authentication and authorization, request validation, mutation, and admission control in addition to being the front-end to the backing datastore.
in relation to Kubernetes is to provide a strong, consistent and highly available key-value store for persisting cluster state. • Stores objects and config information.
core component control loops. • Monitors the cluster state via the apiserver and steers the cluster towards the desired state. List of core controllers: https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-controller-manager/app/controllermanager.go#L344
attempts to place it on a matching resource. • Default scheduler uses bin packing. • Workload Requirements can include: general hardware requirements, affinity/anti-affinity, labels, and other various custom resource requirements.
the lifecycle of every pod on its host. • Kubelet understands YAML container manifests that it can read from several sources: ◦ file path ◦ HTTP Endpoint ◦ etcd watch acting on any changes ◦ HTTP Server mode accepting container manifests over a simple API.
capability into the core control loop of Kubernetes. • The controllers include Node, Route, Service, and add an additional controller to handle things such as PersistentVolume Labels.
pod-to-pod communication managed by a CNI (Container Network Interface) plugin. • Service Network ◦ Cluster-wide range of Virtual IPs managed by kube-proxy for service discovery.
plumbed via the Container Network Interface (CNI). • Functions as an interface between the container runtime and a network implementation plugin. • CNCF Project • Uses a simple JSON Schema.
communicate with each other unimpeded. • All Pods can communicate with all other Pods without NAT. • All nodes can communicate with all Pods (and vice-versa) without NAT. • The IP that a Pod sees itself as is the same IP that others see it as.
within the same network namespace and share an IP. ◦ Enables intrapod communication over localhost. • Pod-to-Pod ◦ Allocated cluster unique IP for the duration of its life cycle. ◦ Pods themselves are fundamentally ephemeral.
a persistent cluster unique IP ◦ exists beyond a Pod’s lifecycle. • External-to-Service ◦ Handled by kube-proxy. ◦ Works in cooperation with a cloud provider or other external entity (load balancer).
both understand and extend. • An API Group is a REST compatible path that acts as the type descriptor for a Kubernetes object. • Referenced within an object as the apiVersion and kind. Format: /apis/<group>/<version>/<resource> Examples: /apis/apps/v1/deployments /apis/batch/v1beta1/cronjobs
Also referenced within the object apiVersion. • Alpha: Possibly buggy, And may change. Disabled by default. • Beta: Tested and considered stable. However API Schema may change. Enabled by default. • Stable: Released, stable and API schema will not change. Enabled by default. Format: /apis/<group>/<version>/<resource> Examples: /apis/apps/v1/deployments /apis/batch/v1beta1/cronjobs
a persistent entity that represent the desired state of the object within the cluster. • All objects MUST have apiVersion, kind, and poses the nested fields metadata.name, metadata.namespace, and metadata.uid.
Object • kind: Type of Kubernetes Object • metadata.name: Unique name of the Object • metadata.namespace: Scoped environment name that the object belongs to (will default to current). • metadata.uid: The (generated) uid for an object. apiVersion: v1 kind: Pod metadata: name: pod-example namespace: default uid: f8798d82-1185-11e8-94ce-080027b3c7a6
Kubernetes Objects are generally represented in YAML. • A “Human Friendly” data serialization standard. • Uses white space (specifically spaces) alignment to denote ownership. • Three basic data types: ◦ mappings - hash or dictionary, ◦ sequences - array or list ◦ scalars - string, number, boolean etc
have an additional two nested fields spec and status. ◦ spec - Describes the desired state or configuration of the object to be created. ◦ status - Is managed by Kubernetes and describes the actual state of the object and its history.
the primary method of partitioning a cluster or scoping access. apiVersion: v1 kind: Namespace metadata: name: prod labels: app: MyBigWebApp $ kubectl get ns --show-labels NAME STATUS AGE LABELS default Active 11h <none> kube-public Active 11h <none> kube-system Active 11h <none> prod Active 6s app=MyBigWebApp
LABELS default Active 11h <none> kube-public Active 11h <none> kube-system Active 11h <none> • default: The default namespace for any object without a namespace. • kube-system: Acts as the home for objects and resources created by Kubernetes itself. • kube-public: A special namespace; readable by all users that is reserved for cluster bootstrapping and configuration.
• Foundational building block of Kubernetes Workloads. • Pods are one or more containers that share volumes, a network namespace, and are a part of a single context.
the container • image - The container image • ports - array of ports to expose. Can be granted a friendly name and protocol may be specified • env - array of environment variables • command - Entrypoint array (equiv to Docker ENTRYPOINT) • args - Arguments to pass to the command (equiv to Docker CMD) Container name: nginx image: nginx:stable-alpine ports: - containerPort: 80 name: http protocol: TCP env: - name: MYVAR value: isAwesome command: [“/bin/sh”, “-c”] args: [“echo ${MYVAR}”]
and group together related sets of objects or resources. • NOT characteristic of uniqueness. • Have a strict syntax with a slightly limited character set*. * https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set
Selector Types Set-based selectors are supported on a limited subset of objects. However, they provide a method of filtering on a set of values, and supports multiple operators including: in, notin, and exist. selector: matchExpressions: - key: gpu operator: in values: [“nvidia”] selector: matchLabels: gpu: nvidia
Exposes a port on every node’s IP. • Port can either be statically defined, or dynamically taken from a range between 30000-32767. apiVersion: v1 kind: Service metadata: name: example-prod spec: type: NodePort selector: app: nginx env: prod ports: - nodePort: 32410 protocol: TCP port: 80 targetPort: 80
type: LoadBalancer selector: app: nginx env: prod ports: protocol: TCP port: 80 targetPort: 80 • LoadBalancer services extend NodePort. • Works in conjunction with an external system to map a cluster external IP to the exposed service.
type: ExternalName spec: externalName: example.com • ExternalName is used to reference endpoints OUTSIDE the cluster. • Creates an internal CNAME DNS entry that aliases another.
off a provided template. • Pod Templates are Pod specs with limited metadata. • Controllers use Pod Templates to make actual pods. apiVersion: v1 kind: Pod metadata: name: pod-example labels: app: nginx spec: containers: - name: nginx image: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx
Pod. • selector:The label selector for the ReplicaSet will manage ALL Pod instances that it targets; whether it’s desired or not. apiVersion: apps/v1 kind: ReplicaSet metadata: name: rs-example spec: replicas: 3 selector: matchLabels: app: nginx env: prod template: <pod template>
Provide rollback functionality and update control. • Updates are managed through the pod-template-hash label. • Each iteration creates a unique label that is assigned to both the ReplicaSet and subsequent Pods.
Deployment to retain. • strategy: Describes the method of updating the Pods based on the type. Valid options are Recreate or RollingUpdate. ◦ Recreate: All existing Pods are killed before the new ones are created. ◦ RollingUpdate: Cycles through updating the Pods according to the parameters: maxSurge and maxUnavailable. apiVersion: apps/v1 kind: Deployment metadata: name: deploy-example spec: replicas: 3 revisionHistoryLimit: 3 selector: matchLabels: app: nginx env: prod strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 template: <pod template>
run an instance of the supplied Pod. • They bypass default scheduling mechanisms. • Are ideal for cluster wide services such as log forwarding, or health monitoring. • Revisions are managed via a controller-revision-hash label.
DaemonSet to retain. • updateStrategy: Describes the method of updating the Pods based on the type. Valid options are RollingUpdate or OnDelete. ◦ RollingUpdate: Cycles through updating the Pods according to the value of maxUnavailable. ◦ OnDelete: The new instance of the Pod is deployed ONLY after the current instance is deleted. apiVersion: apps/v1 kind: DaemonSet metadata: name: ds-example spec: revisionHistoryLimit: 3 selector: matchLabels: app: nginx updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 template: spec: nodeSelector: nodeType: edge <pod template>
maintain state. • Pod identity including hostname, network, and storage WILL be persisted. • Assigned a unique ordinal name following the convention of ‘<statefulset name>-<ordinal index>’.
2 revisionHistoryLimit: 3 selector: matchLabels: app: stateful serviceName: app updateStrategy: type: RollingUpdate rollingUpdate: partition: 0 template: <pod template> • revisionHistoryLimit: The number of previous iterations of the StatefulSet to retain. • serviceName: The name of the associated headless service; or a service without a ClusterIP.
based on the type. Valid options are OnDelete or RollingUpdate. ◦ OnDelete: The new instance of the Pod is deployed ONLY after the current instance is deleted. ◦ RollingUpdate: Pods with an ordinal greater than the partition value will be updated in one-by-one in reverse order. apiVersion: apps/v1 kind: StatefulSet metadata: name: sts-example spec: replicas: 2 revisionHistoryLimit: 3 selector: matchLabels: app: stateful serviceName: app updateStrategy: type: RollingUpdate rollingUpdate: partition: 0 template: <pod template>
containerPort: 80 volumeMounts: - name: www mountPath: /usr/share/nginx/html volumeClaimTemplates: - metadata: name: www spec: accessModes: [ "ReadWriteOnce" ] storageClassName: standard resources: requests: storage: 1Gi • volumeClaimTemplates: Template of the persistent volume(s) request to use for each instance of the StatefulSet.
] storageClassName: standard resources: requests: storage: 1Gi $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE www-sts-example-0 Bound pvc-d2f11e3b-18d0-11e8-ba4f-080027a3682b 1Gi RWO standard 4h www-sts-example-1 Bound pvc-d3c923c0-18d0-11e8-ba4f-080027a3682b 1Gi RWO standard 4h <Volume Name>-<StatefulSet Name>-<ordinal> Persistent Volumes associated with a StatefulSet will NOT be automatically garbage collected when it’s associated StatefulSet is deleted. They must manually be removed.
executed and successfully terminate. • Will continue to try and execute the job until it satisfies the completion and/or parallelism condition. • Pods are NOT cleaned up until the job itself is deleted.*
itself is considered failed. • completions: The total number of successful completions desired. • parallelism: How many instances of the pod can be run concurrently. • spec.template.spec.restartPolicy: Jobs only support a restartPolicy of type Never or OnFailure. apiVersion: batch/v1 kind: Job metadata: name: job-example spec: backoffLimit: 4 completions: 4 parallelism: 2 template: spec: restartPolicy: Never <pod-template>
successfulJobHistoryLimit: The number of successful jobs to retain. • failedJobHistoryLimit: The number of failed jobs to retain. apiVersion: batch/v1beta1 kind: CronJob metadata: name: cronjob-example spec: schedule: "*/1 * * * *" successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 jobTemplate: spec: completions: 4 parallelism: 2 template: <pod template>
exchanging data between containers, or persisting some form of data. For this we have Volumes, PersistentVolumes, PersistentVolumeClaims, and StorageClasses.
• A pod can have one or more types of volumes attached to it. • Can be consumed by any of the containers within the pod. • Survive Pod restarts; however their durability beyond that is dependent on the Volume Type.
attached to the Pod. Every object within the list must have it’s own unique name. • volumeMounts: A container specific list referencing the Pod volumes by name, along with their desired mountPath. apiVersion: v1 kind: Pod metadata: name: volume-example spec: containers: - name: nginx image: nginx:stable-alpine volumeMounts: - name: html mountPath: /usr/share/nginx/html ReadOnly: true - name: content image: alpine:latest command: ["/bin/sh", "-c"] args: - while true; do date >> /html/index.html; sleep 5; done volumeMounts: - name: html mountPath: /html volumes: - name: html emptyDir: {}
attached to the Pod. Every object within the list must have it’s own unique name. • volumeMounts: A container specific list referencing the Pod volumes by name, along with their desired mountPath. apiVersion: v1 kind: Pod metadata: name: volume-example spec: containers: - name: nginx image: nginx:stable-alpine volumeMounts: - name: html mountPath: /usr/share/nginx/html ReadOnly: true - name: content image: alpine:latest command: ["/bin/sh", "-c"] args: - while true; do date >> /html/index.html; sleep 5; done volumeMounts: - name: html mountPath: /html volumes: - name: html emptyDir: {}
attached to the Pod. Every object within the list must have it’s own unique name. • volumeMounts: A container specific list referencing the Pod volumes by name, along with their desired mountPath. apiVersion: v1 kind: Pod metadata: name: volume-example spec: containers: - name: nginx image: nginx:stable-alpine volumeMounts: - name: html mountPath: /usr/share/nginx/html ReadOnly: true - name: content image: alpine:latest command: ["/bin/sh", "-c"] args: - while true; do date >> /html/index.html; sleep 5; done volumeMounts: - name: html mountPath: /html volumes: - name: html emptyDir: {}
• PVs are a cluster wide resource linked to a backing storage provider: NFS, GCEPersistentDisk, RBD etc. • Generally provisioned by an administrator. • Their lifecycle is handled independently from a pod • CANNOT be attached to a Pod directly. Relies on a PersistentVolumeClaim
storage. • Satisfies a set of requirements instead of mapping to a storage resource directly. • Ensures that an application’s ‘claim’ for storage is portable across numerous backends or providers.
50Gi volumeMode: Filesystem accessModes: - ReadWriteOnce - ReadWriteMany persistentVolumeReclaimPolicy: Delete storageClassName: slow mountOptions: - hard - nfsvers=4.1 nfs: path: /exports server: 172.22.0.42 PersistentVolume • capacity.storage: The total amount of available storage. • volumeMode: The type of volume, this can be either Filesystem or Block. • accessModes: A list of the supported methods of accessing the volume. Options include: ◦ ReadWriteOnce ◦ ReadOnlyMany ◦ ReadWriteMany
deleted. Options include: ◦ Retain - manual clean-up ◦ Delete - storage asset deleted by provider. • storageClassName: Optional name of the storage class that PVC’s can reference. If provided, ONLY PVC’s referencing the name consume use it. • mountOptions: Optional mount options for the PV. apiVersion: v1 kind: PersistentVolume metadata: name: nfsserver spec: capacity: storage: 50Gi volumeMode: Filesystem accessModes: - ReadWriteOnce - ReadWriteMany persistentVolumeReclaimPolicy: Delete storageClassName: slow mountOptions: - hard - nfsvers=4.1 nfs: path: /exports server: 172.22.0.42
This MUST be a subset of what is defined on the target PV or Storage Class. ◦ ReadWriteOnce ◦ ReadOnlyMany ◦ ReadWriteMany • resources.requests.storage: The desired amount of storage for the claim • storageClassName: The name of the desired Storage Class kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pvc-sc-example spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: slow
consumed. Bound The PV has been bound to a claim. Released The binding PVC has been deleted, and the PV is pending reclamation. Failed An error has been encountered attempting to reclaim the PV.
an external storage resource (PV) • Work hand-in-hand with the external storage system to enable dynamic provisioning of storage • Eliminates the need for the cluster admin to pre-provision a PV
of the StorageClass. 2. StorageClass provisions request through API with external storage system. 3. External storage system creates a PV strictly satisfying the PVC request. 4. provisioned PV is bound to requesting PVC.
provisioning of the external storage. • parameters: A hash of the various configuration parameters for the provisioner. • reclaimPolicy: The behaviour for the backing storage when the PVC is deleted. ◦ Retain - manual clean-up ◦ Delete - storage asset deleted by provider kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: standard provisioner: kubernetes.io/gce-pd parameters: type: pd-standard zones: us-central1-a, us-central1-b reclaimPolicy: Delete
referenced through several different means: ◦ environment variable ◦ a command line argument (via env var) ◦ injected as a file into a volume mount • Can be created from a manifest, literals, directories, or files directly.
base64 encoded content. • Encrypted at rest within etcd (if configured!). • Ideal for username/passwords, certificates or other sensitive information that should not be stored in a container. • Can be created from a manifest, literals, directories, or from files directly.
smaller self-managed communities known as Special Interest Groups (SIG). • Hold weekly public recorded meetings and have their own mailing lists and slack channels.
time-bounded, or act as a focal point for cross-sig coordination. • Hold scheduled publicly recorded meetings in addition to having their own mailing lists and slack channels.
https://www.katacoda.com/courses/kubernetes • Learn Kubernetes the Hard Way https://github.com/kelseyhightower/kubernetes-the-hard-way • Official Kubernetes Youtube Channel https://www.youtube.com/c/KubernetesCommunity • Official CNCF Youtube Channel https://www.youtube.com/c/cloudnativefdn • Track to becoming a CKA/CKAD (Certified Kubernetes Administrator/Application Developer) https://www.cncf.io/certification/expert/ • Awesome Kubernetes https://www.gitbook.com/book/ramitsurana/awesome-kubernetes/details