Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hypernetes: Multi-tenant and Secure Kubernetes

Hypernetes: Multi-tenant and Secure Kubernetes

Hypernetes: Multi-tenant and Secure Kubernetes This is the speak I did at LinuxCon EU 2016

Avatar for Lei (Harry) Zhang

Lei (Harry) Zhang

September 28, 2016
Tweet

More Decks by Lei (Harry) Zhang

Other Decks in Technology

Transcript

  1. About Me • Lei (Harry) Zhang • #Microsoft MVP in

    cloud and datacenter management • though I’m a Linux guy :/ • Previous: VMware, Baidu • Feature maintainer of Kubernetes • HyperCrew: https://hyper.sh • Publications: Docker & Kubernetes Under the Hood • PhD candidate @ZJU: Large-scale cluster management and scheduling
  2. A survey about “boundary” • Are you comfortable with Linux

    containers as an effective boundary? • Yes, I use containers in my private/safe environment • No, I use containers to serve the public cloud
  3. As long as we care security… • We have to

    wrap containers inside full-blown virtual machines • But we lose cloud-native deployment • Slow startup time • Huge resources wasting • Memory tax for every container • … dream reality
  4. Revisit container • Container Runtime • The dynamic view and

    boundary of your running process • Container Image • The static view of your program, data, dependencies, files and directories namespace cgroups FROM busybox ADD temp.txt / VOLUME /data CMD [“echo hello"] Read-Write Layer & /data “echo hello” read-only layer /bin /dev /etc /home /lib / lib64 /media /mnt /opt /proc / root /run /sbin /sys /tmp / usr /var /data /temp.txt /etc/hosts /etc/hostname /etc/resolv.conf read-write layer /tem p.txt json json init layer FROM busybox ADD temp.txt / VOLUME /data CMD [“echo hello"] Docker Container
  5. HyperContainer • Container Runtime • RunV • https://github.com/hyperhq/runv • The

    OCI compatible hypervisor based runtime implementation • Widely adopted by companies like Huawei etc • Control daemon • https://github.com/hyperhq/hyperd • Container Image • Docker Image Spec
  6. Combine the best parts • Portable and behaves like a

    Linux container • $ hyperctl run -t busybox echo helloworld • sub-second startup time*, ~12MB memory cost • Fully isolated sandbox with an independent guest kernel • $ hyperctl exec -t busybox uname -r • 4.4.12-hyper (or your provided kernel) • security, backward compatibility, maturity See: http://hypercontainer.io/why-hyper.html
  7. HyperContainer is a Pod • That’s how HyperContainer fits into

    the Kubernetes philosophy • Wait, why Pod is so important?
  8. Pod: lesson learned from Borg • InitContainers: one or more

    containers started in sequence before the pod's normal containers are started. • Share volumes, perform network operations, and perform computation prior to the app containers.
  9. So, Pod is • The group of super-affinity containers •

    The atomic scheduling unit • The process group in container cloud • Do right things • without modifying your container image • Kubernetes = Spring Framework • Pod = IoC Pod log app infra container volume init container
  10. Pod is not easy to simulate • log super affinity

    app • Requirement: • app: 1G, log: 0.5G • Available: • Node_A: 1.25G, Node_B: 2G • What happens if app scheduled to Node_A?
  11. HyperContainer is a Pod • Linux container based runtimes •

    wraps and encapsulates several app containers into a logical group • Hypervisor container based runtime • hypervisor serves as a natural boundary of Pod
  12. HyperContainer is a Pod • kubelet Container Runtime Interface •

    create sandbox Foo --> create container C --> start container C • stop container C --> remove container C --> delete sandbox Foo • Sandbox • Normally: the infra container • HyperContainer: hypervisor • with HyperKernel • a HyperStart process as PID 1 • setup mnt namespace, launch apps from the images etc
  13. Hypernetes • Also: h8s 1. Kubernetes + HyperContainer runtime •

    officially supported by using kubernetes/frakti 2. Multi-tenant network and persistent volumes • battle tested Neutron + Cinder plugin
  14. Multi-tenant Network • Goal: • leveraging tenant-aware neutron network for

    Kubernetes • following the network plugin workflow • Non-goal: • break k8s network model or hack k8s code
  15. Define the Network • Network • a top class api

    object • each tenant (created by Keystone) has its own Network • Network mapping to Neutron “net” • a Network Controller is responsible to manage Network lifecycle
  16. Example kubelet SyncLoop controller-manager ControlLoop kubelet SyncLoop proxy proxy network

    pod replica namespace service job deployment volume petset … etcd scheduler api-server Desired World Real World Call Neutron to create/delete network
  17. Kubernetes Network Model • Container reach container • all containers

    can communicate with all other containers without NAT • Node reach container • all nodes can communicate with all containers (and vice-versa) without NAT • IP addressing • Pod in cluster can be addressed by its IP
  18. How h8s fits that? • Network can be assigned to

    one or more Namespaces • Pods belonging to the same Network can reach each other directly through IP • a Pod’s network mapping to Neutron “port” • kubelet is responsible for Pod network setup • let’s see how kubelet works
  19. Example kubelet SyncLoop kubelet SyncLoop proxy proxy 3.1 New pod

    object detected 3.2 Bind pod with node etcd scheduler api-server
  20. Example kubelet SyncLoop kubelet SyncLoop proxy proxy 4.1 Detected pod

    bind with me 4.2 Start containers in pod etcd scheduler api-server
  21. Design of kubelet InitNetworkPlugin Choose Runtime ҁdocker, rkt, hyper/remote҂ InitNetworkPlugin

    HandlePods {Add, Update, Remove, Delete, …} NodeStatus Network Status status Manager PLEG SyncLoop Pod Update Worker (e.g.ADD) • generale Pod status • check volume status (talk later) • call runtime to start containers • set up Pod network (see next slide) volume Manager PodUpdate image Manager
  22. kubestack A standalone gRPC daemon 1. to “translate” the SetUpPod

    request to the Neutron network API 2. handling multi-tenant Service proxy
  23. Service $ iptables-save | grep my-service -A KUBE-SERVICES -d 10.0.0.116/32

    -p tcp -m comment --comment "default/my-service: cluster IP" -m tcp --dport 8001 -j KUBE-SVC-KEAUNL7HVWWSEZA6 -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-6XXFWO3KTRMPKCHZ -A KUBE-SVC-KEAUNL7HVWWSEZA6 -m comment --comment "default/my-service:" --mode random -j KUBE-SEP-57KPRZ3JQVENLNBRZ -A KUBE-SEP-6XXFWO3KTRMPKCHZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.2:80 -A KUBE-SEP-57KPRZ3JQVENLNBRZ -p tcp -m comment --comment "default/my-service:" -m tcp -j DNAT --to-destination 172.17.0.3:80 portal 10.10.0.116:8001 random mode rules backend rule_1 backend rule_2 172.17.0.2.:80 172.17.0.3.:80 OnServiceUpdate OnEndpointsUpdate
  24. Multi-tenant Service • Default iptables-based kube-proxy is not tenant aware

    • Endpoint Pods and Nodes with iptables rules are isolated into different networks • Hypernetes uses a built-in HAproxy as the Service portal • to handle all Service instances within same namespace • the same OnServiceUpdate and OnEndpointsUpdate workflow • ExternalProvider • a OpenStack LB will be created as Service • e.g. curl 58.215.33.98:8078
  25. Kubernetes Persistent Volume Host path Cinder volume plugin Pod Pod

    mountPath mountPath attach mount Volume Manager desired World reconcile • Get mountedVolume from actualStateOfWorld • Unmount volumes in mountedVolume but not in desiredStateOfWorld • AttachVolume() if vol in desiredStateOfWorld and not attached • MountVolume() if vol in desiredStateOfWorld and not in mountedVolume • Verify devices that should be detached/unmounted are detached/unmounted • Tips: 1. -v host:path 2. attach VS mount 3. Totally independent from container management
  26. Persistent Volume with HyperContainer • Enhanced Cinder volume plugin •

    Linux container: 1. full OpenStack cluster 2. query Nova to find node 3. attach Cinder volume to host path 4. bind mount host path to Pod containers • HyperContainer: • directly attach block devices to Pod • thanks to the hypervisor based Pod boundary • eliminates extra time to query Nova Host vol Enhanced Cinder volume plugin Pod Pod mountPath mountPath attach vol desired World reconcile Volume Manager
  27. Future of CRI • Keep Docker as the only one

    default container runtime • oci-runtime, rktlet, hyperd • Frakti: the Remote Container Runtime Kit • https://github.com/kubernetes/frakti • welcome to tryout, star and fork
  28. “if image becomes non-standard” • e.g. Docker image becomes somehow

    Docker specific • Don’t worry, kubelet.imageManager is moving to runtime specific • but then k8s will probably choose • NO DEFAULT runtime
  29. Node Node Full Topology Node kubestack Neutron L2 Agent kube-proxy

    kubelet Cinder Plugin Pod Pod Pod Pod KeyStone Neutron Cinder Master Object: Network Ceph Object: Pod Object: …
  30. Summary • A new way to build secure and multi-tenant

    Kubernetes • Kubernetes + HyperContainer + Neutron Plugin + Cinder Plugin + Keystone • Project URL: https://github.com/hyperhq/hypernetes • Roadmap • Graduate HyperContainer runtime on k8s upstream • see HyperContainer in official k8s release • Neutron CNI plugin • Tip: https://hyper.sh is totally built on Hypernetes, try it out :)