Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Multi-tenancy Best Practices for Google Kuberne...

Multi-tenancy Best Practices for Google Kubernetes Engine

Video: https://www.youtube.com/watch?v=RkY8u1_f5yY

In this talk, we talk about the Kubernetes APIs and GKE features that allow you to create a multi-tenant cluster.

Refer to documentation at https://cloud.google.com/kubernetes-engine/ or https://kubernetes.io for up-to-date instructions as information on the slide deck can be out-of-date very quickly.

Ahmet Alp Balkan

July 26, 2018
Tweet

More Decks by Ahmet Alp Balkan

Other Decks in Technology

Transcript

  1. AHMET ALP BALKAN SOFTWARE ENGINEER, GOOGLE CLOUD YOSHI TAMURA PRODUCT

    MANAGER, GOOGLE CLOUD THURSDAY, JULY 26 IO232 Multi-Tenancy Best Practices for Google Kubernetes Engine 1
  2. Who are we? Ahmet Alp Balkan (@ahmetb) Software Engineer at

    Developer Relations I work on making Kubernetes Engine easier to understand and use for developers and operators and write open source tools for Kubernetes. Previously, I worked at Microsoft Azure, on porting Docker to Windows and ACR. I maintain "kubectx". 2
  3. Yoshi Tamura (@yoshiat) Product Manager, Kubernetes Engine I work on

    Multi-tenancy and Hardware Accelerators (GPU and Cloud TPU) in Kubernetes Engine. Who are we? 3
  4. Practical Multi-Tenancy on Kubernetes Engine Following slides heavily inspired by

    KubeCon EU'18 talk of David Oppenheimer, Software Engineer, Google 4 Register your interest at: gke.page.link/multi-tenancy
  5. trust multi-tenancy modes isolation access control resource usage scheduling multi-tenancy

    features policy management preventing contention billing 5
  6. 8

  7. • Your compiler* • Operating system • Dependencies • Deployment

    pipeline • Container runtime ... Do you trust... * Bonus reading on compilers: - Reflections on trusting trust. Ken Thompson. 1984. CACM 27, 8 (August 1984), 761-763. - Fully Countering Trusting Trust through Diverse Double-Compiling. D A Wheeler. PhD thesis, George Mason University, Oct. 2009. 11
  8. Levels of trust software multi-tenancy Trusted Semi-trusted Non-trusted the code

    comes from an audited source, built and run by trusted components (a.k.a “the dream”) the code comes from potentially hostile users, cannot assume good intent (a.k.a. hosting providers) trusted code, but has 3rd party dependencies or software not fully audited (a.k.a most people) 12
  9. project-2 Pros • Separate control plane (API) for each tenant

    (for free*) • Strong network isolation (if it's per-cluster VPC) However: • Need tools to manage 10s or 100s of clusters • Resource/configuration fragmentation of clusters • Slow turn-up: need to create a cluster for a new tenant * Google Kubernetes Engine control plane (master) is free of charge. Cluster per Tenant cluster cluster cluster project-1 15
  10. Namespace per tenant (intra-cluster multi-tenancy) Namespaces provide logical isolation between

    tenants on a cluster. Kubernetes policies are namespace-scoped. • Logical isolation between tenants • Policies for API access restrictions & resource usage constraints Pros: • Tenants can reuse extensions/controllers/CRDs • Shared control plane (=shared ops, shared security/auditing…) ns1 ns2 ns3 ns4 16
  11. Kubernetes Engine primitives Quotas Network Policy Pod Security Policy Pod

    Priority Limit Range IAM Sandbox Pods RBAC Access Control Resource Sharing Runtime Isolation Pod Affinity /Anti-Affinity Admissio n Control 17
  12. Enterprise SaaS (Software as a Service) Multi-tenancy use cases in

    Kubernetes KaaS (Kubernetes as a Service) 19
  13. All users from the same company/organization Namespaces ⇔ Tenants ⇔

    Teams Semi-trusted tenants (you can fire them on violation) Cluster Roles: • Cluster Admin ◦ CRUD any policy objects ◦ Create/assign namespaces to “Namespace Admins” ◦ Manage policies (resource usage quotas, networking) • Namespace Admin ◦ Manage users in the namespace(s) they own. • User ◦ CRUD non-policy objects in the namespace(s) they have access to “Enterprise” Model Control Plane (apiserver) Cluster Admin ns2 ns3 ns4 ns1 Namespace Admin Namespace Admin Namespace Admin 20
  14. Many apps from different teams, semi-trusted • Vanilla container isolation

    may suffice • If not: Sandboxing with gVisor, limit capabilities, use seccomp/AppArmor/... Network isolation: • Allow all traffic within a namespace • Whitelist traffic from/to other namespaces (=teams) “Enterprise” Model Control Plane (apiserver) Cluster Admin ns2 ns3 ns4 ns1 Namespace Admin Namespace Admin Namespace Admin 21
  15. “Software as a Service” model Control Plane (apiserver) Cluster Admin

    SaaS API/proxy SaaS Consumers cluster 22 Consumer deploys their app through a custom control plane.
  16. “Software as a Service” model 23 Control Plane (apiserver) Cluster

    Admin SaaS API/proxy SaaS Consumers cluster Consumer deploys their app through a custom control plane. After the app is deployed, customers directly connect to the app. Example: Wordpress hosting
  17. “Software as a Service” model 24 Control Plane (apiserver) Cluster

    Admin SaaS API/proxy SaaS Consumers cluster Consumer deploys their app through a custom control plane. After the app is deployed, customers directly connect to the app. Example: Wordpress hosting SaaS API is a trusted client of Kubernetes. Cluster admins can access the Kubernetes API directly. Tenant workloads may have untrusted pieces: • such as WordPress extensions • may require sandboxing with gVisor etc.
  18. Untrusted tenants running untrusted code. (Platform as a Service or

    hosting companies.) Tenants may create their namespaces, but cannot set policy objects. Stronger isolation requirements than enterprise/SaaS: • isolated world view (separate control plane) • tenants must not see each other • strong node and network isolation ◦ sandbox pods ◦ sole-tenant nodes ◦ multi-tenant networking/DNS “Kubernetes as a Service” model Control Plane (apiserver) Cluster Admin ns1 ns2 ns3 ns4 25
  19. Untrusted tenants running untrusted code. (Platform as a Service or

    hosting companies.) Tenants may create their namespaces, but cannot set policy objects. Stronger isolation requirements than enterprise/SaaS: • isolated world view (separate control plane) • tenants must not see each other • strong node and network isolation ◦ sandbox pods ◦ sole-tenant nodes ◦ multi-tenant networking/DNS “Kubernetes as a Service” model Control Plane (apiserver) Cluster Admin ns1 ns2 ns3 ns4 26
  20. Kubernetes Engine multi-tenancy primitives Quotas Network Policy Pod Security Policy

    Pod Priority Limit Range IAM Sandbox Pods RBAC Access Control Resource Sharing Runtime Isolation Pod Security Context Pod Affinity Admissio n Control 28
  21. Kubernetes Engine multi-tenancy primitives Quotas Network Policy Pod Priority Limit

    Range IAM Sandbox Pods RBAC Auth related Scheduling related Pod Security Context Pod Affinity Admissio n Control Pod Security Policy 29
  22. Authentication, Authorization, Admission Control Plane (apiserver) Authorizer Pluggable Auth (GKE

    IAM) RBAC Admission Control allow etcd Cloud IAM Policies {Cluster,}Role {Cluster,}RoleBinding allow Pods 31
  23. Kubernetes RBAC Mostly useful for: • Giving access to pods

    calling Kubernetes API (with Kubernetes Service Accounts) • Giving fine-grained access to people/groups calling Kubernetes API (with Google accounts) Concepts: ClusterRole A preset of capabilities, cluster-wide Role ClusterRole, but namespace-scoped ClusterRoleBinding Give permissions of a ClusterRole to: • Google users/groups • Google Cloud IAM Service Accounts • Kubernetes Service Accounts RoleBinding ClusterRoleBinding, but namespace-scoped. 33
  24. kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: "admins:namespace-creator" roleRef: kind: Role

    name: "namespace-creator" apiGroup: rbac.authorization.k8s.io subjects: - kind: User name: "[email protected]" # Google user apiGroup: rbac.authorization.k8s.io Kubernetes RBAC Example ClusterRole+Binding for namespace-creator: kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: "namespace-creator" rules: - apiGroups: [""] # core resources: ["namespaces"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] 34
  25. Practical for giving Google users/groups project-wide access: Curated IAM “Roles”:

    Kubernetes Engine + Cloud IAM Admin *Can do everything* Viewer *Can view everything* Cluster Admin Can manage clusters (create/delete/upgrade clusters) Cannot view what's in the clusters (Kubernetes API) Developer Can do everything in a cluster (Kubernetes API) Cannot manage clusters (create/delete/upgrade clusters) You can curate new ones with Cloud IAM Custom Roles. 35
  26. Kubernetes Engine + IAM Give someone "Developer" role on all

    clusters in the project: gcloud projects add-iam-policy-binding PROJECT_ID \ --member=user:[email protected] \ --role=roles/container.developer Give a Google Group "Viewer" role on all clusters in the project: gcloud projects add-iam-policy-binding PROJECT_ID \ --member=group:[email protected] \ --role=roles/container.viewer 36
  27. Admission Controls Intercept API request before resource is persisted. Admission

    control can mutate and allow/deny. Admission Control etcd Admission Plugins allow 37
  28. Compiled into Kubernetes apiserver binary. Enabled Admission Plugins be changed

    on Kubernetes Engine. But these 15 admission plugins are already enabled: Initializers, NamespaceLifecycle, LimitRanger, ServiceAccount, PersistentVolumeLabel, DefaultStorageClass, DefaultTolerationSeconds, NodeRestriction, PodPreset, ExtendedResourceToleration, PersistentVolumeClaimResize, Priority, StorageObjectInUseProtection, MutatingAdmissionWebhook, ValidatingAdmissionWebhook Admission Controls 38
  29. Extending Admission Controls You can develop webhooks to create your

    own Admission Controllers. Admission Control etcd ValidatingAdmissionWebHook MutatingAdmissionWebHook allow <your webhooks> Other Admission Plugins <your webhooks> 39
  30. PodSecurityPolicy Restricts access to host {filesystem, network, ports, PID namespace,

    IFS namespace}... Limits privileged containers, volume types, enforces read-only filesystem etc. Enforced through its own admission plugin. Admission Control Pod Spec PSP Admission Controller allow/deny PodSecurityPolicy Spec 40
  31. PodSecurityPolicy apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: prevent-root-privileged spec: #

    Don't allow privileged pods! privileged: false # Don't allow root containers! runAsUser: rule: "MustRunAsNonRoot" $ kubectl create role psp:unprivileged \ --verb=use \ --resource=podsecuritypolicy \ --resource-name=unprivileged $ kubectl create rolebinding developers:unprivileged \ --role=psp:unprivileged \ [email protected] \ [email protected] apiVersion: v1 kind: Pod metadata: name: foo spec: containers: - image: k8s.gcr.io/pause securityContext: privileged: true REJECT 41
  32. Which pods can talk to which other pods (based on

    their namespace/labels) or IP ranges. Available on Kubernetes Engine with Calico Network Plugin (--enable-network-policy). Network Policy 42
  33. Network Policy kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: db-allow-frontend spec:

    podSelector: matchLabels: app: mysql ingress: - from: - podSelector: matchLabels: app: frontend Example: Allow traffic to "mysql" Pods from "frontend" pods 43
  34. Pod Priority/Preemption (beta – Kubernetes 1.11) Pod Priority: Puts high

    priority pods waiting in Pending state in front of the scheduling queue. Pod Preemption: Evicts lower priority pod(s) from a Node, if high priority pod cannot be scheduled due to not enough space/resources in the cluster. Use PriorityClasses to define: apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: "high" value: 1000000 apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: "normal" value: 1000 globalDefault: true apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: "low" value: 10 47
  35. Resource Quotas Limits total memory/cpu/storage that pods can use, and

    how many objects of each type (pods, load balancers, ConfigMaps, etc.) on a per-namespace basis 48
  36. apiVersion: v1 kind: ResourceQuota metadata: name: compute-quota namespace: staging spec:

    hard: requests.cpu: "8" requests.memory: 2Gi limits.cpu: "10" limits.memory: 3Gi requests.storage: 120Gi Resource Quotas – Example apiVersion: v1 kind: ResourceQuota metadata: name: object-quota namespace: staging spec: hard: pods: "30" services: "2" services.loadbalancers: "0" persistentvolumeclaims: "5" 49
  37. apiVersion: v1 kind: ResourceQuota metadata: name: low-priority-compute spec: scopeSelector: matchExpressions:

    - operator : In scopeName: PriorityClass values: ["low"] hard: pods: "100" cpu: "10" memory: 12GiB apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: low value: 10 Resource Quotas + PriorityClass Set different quotas for pods per PriorityClass (alpha in Kubernetes 1.11, disabled by default) apiVersion: v1 kind: Pod metadata: name: unimportant-pod spec: containers: [...] priorityClassName: low 50
  38. If pod spec doesn't specify limits/requests use these defaults. Limit

    Range Specify {default, min, max} resource constraints for each pod/container per namespace. apiVersion: v1 kind: LimitRange metadata: name: default-compute-limits spec: limits: - type: Pod # or "Container" default: memory: 128MiB cpu: 200m defaultRequest: memory: 64MiB cpu: 100m 51
  39. apiVersion: v1 kind: LimitRange metadata: name: compute-limits spec: limits: -

    type: "Container" min: memory: 32MiB cpu: 10m max: memory: 800MiB cpu: "2" A container cannot have less resources than these. Limit Range Specify {default, min, max} resource constraints for each pod/container. A container cannot have more resources than these. 52
  40. Pod Anti-Affinity apiVersion: v1 kind: Pod metadata: name: foo labels:

    team: "billing" spec: ... apiVersion: v1 kind: Pod metadata: name: bar labels: team: "billing" spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: "kubernetes.io/hostname" labelSelector: matchExpressions: - key: "team" operator: NotIn values: ["billing"] Constrain scheduling of pods, based on the labels of other pods scheduled on the node. Example: 53 keep me off of nodes that have pods that don't have the "billing" label
  41. Use taints on nodes and tolerations on Pods to dedicate

    partition of cluster to particular pods/users. Useful for partitioning/dedicating special machines on the cluster to the team(s) that asked for it. Dedicated Nodes GPU node GPU node node node node node GPU node GPU node node node node node Reserved for ML Team node node 54
  42. You can apply "taints" to Kubernetes Engine node-pools at creation

    time: $ gcloud container node-pools create gpu-pool \ --cluster=example-cluster \ --node-taints=team=machine-learning:NoSchedule (This is better than “kubectl taint nodes” command as it keeps working when node pools resize or nodes are auto-repaired.) Dedicated Nodes 55
  43. apiVersion: v1 kind: Pod metadata: labels: team: "machine-learning" spec: tolerations:

    - key: "team" operator: "Equal" value: "machine-learning" effect: "NoSchedule" You can apply "taints" to Kubernetes Engine node-pools at creation time: $ gcloud container node-pools create gpu-pool \ --cluster=example-cluster \ --node-taints=team=machine-learning:NoSchedule (This is better than “kubectl taint nodes” command as it keeps working when node pools resize or nodes are auto-repaired.) Use a "toleration" on the pods from this team: Dedicated Nodes 56
  44. Sandboxed Pods Linux kernel bugs and security vulnerabilities may bypass

    container security boundaries. Approaches in this space: • Kata Containers • gVisor (Google’s approach!) Check out talk: IO310-Sandboxing your containers with gVisor 57
  45. gVisor - Google approach to Sandbox Pods Sandbox for Containers

    Implements Linux system calls in user space Zero config Written in Go Container Kernel System Calls Hardware gVisor Limited System Calls Independent Kernel Virtualization-based Strong Isolation 58
  46. gVisor on Kubernetes - Architecture runsc: OCI runtime powered by

    gVisor Sentry (emulated Linux Kernel) is the 1st isolation boundary seccomp + namespace is the 2nd isolation boundary Gofer handles Network and File I/O KVM Gofer Host Linux Kernel Container Sentry (emulated Linux Kernel) Sandbox User Kernel 9P seccomp + ns runsc OCI Kubernetes 59
  47. Sandbox Pods in Kubernetes Work In Progress RuntimeClass is a

    new API to specify runtimes Specify the RuntimeClass in your Pod spec apiVersion: v1alpha1 kind: RuntimeClass metadata: name: gvisor spec: runtimeHandler: gvisor ... apiVersion: v1 kind: Pod ... spec: ... runtimeClassName: gvisor
  48. project cluster1 You wrote all these policies, but how do

    you deploy and manage them in practice? Keeping Kubernetes/IAM policies up to date across namespaces / clusters / projects is difficult! Scalable Policy Management ns2 ns1 cluster2 ns2 ns1 project cluster3 ns2 ns1 project cluster4 ns2 ns1 62
  49. Kubernetes Engine Policy Management NEW! (alpha) Centrally defined policies. •

    Single source of truth • ..as opposed to "git" vs "Kubernetes API" vs "Cloud IAM" Applies policies hierarchically. • Organization → Folder → Project → Cluster → Namespace • Policies are inherited. Lets you manage namespaces, RBAC, and more… Check out talk (happening now): IO200-Take Control of your Multi-cluster, Multi-Tenant Kubernetes Workloads Participate in alpha: goog.page.link/kpm-alpha 63
  50. Kubernetes API: • Currently API calls are not rate limited,

    open to DoS from tenants, impacting others. Networking: • Networking is not a scheduled resource in Kubernetes, yet (cannot use with limits/requests) • Tenants can still discover each other via Kubernetes DNS Many more... Kubernetes Multi-tenancy Limitations Today 64
  51. Determine your use case • How trusted are your tenant

    users and workloads? • What degree and kinds of isolation do you need? Namespace-centric multi-tenancy • Utilize Policy objects for scheduling and access control. • Think about personas and map them to RBAC cluster roles. • Automate policies across clusters with GKE Policy Management (alpha). Key Takeaways 65
  52. Kubernetes Multi-tenancy Working Group - https://github.com/kubernetes/community/tree/master/wg-multitenancy - [email protected] - Organizers:

    - David Oppenheimer (@davidopp), Google - Jessie Frazelle (@jessfraz), Microsoft Kubernetes Policy Working Group - https://github.com/kubernetes/community/tree/master/wg-policy - [email protected] Participate! 66 Register your interest at: gke.page.link/multi-tenancy
  53. Thank you. Ahmet Alp Balkan (@ahmetb) Yoshi Tamura (@yoshiat) 67

    Register your interest at: gke.page.link/multi-tenancy
  54. Example: “testing team has 10,000 CPU hours per month” Most

    of the resources are billable on the cloud: • Compute: CPU/memory • Networking: transfer costs, load balancing, reserved IPs • Storage: persistent disks, SSDs • Other services (Cloud PubSub, Cloud SQL, …) provisioned through Service Catalog. Kubernetes doesn't offer a way to do internal chargeback for compute/cloud resources used. Internal Billing/Chargeback 68
  55. Function ldap Date LGTM Notes Speaker(s) ahmetb / yoshiat ahmetb

    → Done (7/19) yoshiat → Peer Reviewer davidopp 7/23 a couple of small remaining comments to resolve, but nothing to block LGTM PR jacinda Legal Design PMM hrdinsky / praveenz Practice Buddy (optional) Approvals & Reviews 69