Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Ins and Outs of Networking in Google Contai...

The Ins and Outs of Networking in Google Container Engine

Tim Hockin

March 10, 2017
Tweet

More Decks by Tim Hockin

Other Decks in Technology

Transcript

  1. Kubernetes is about clusters Because of that, networking is pretty

    important Most of Kubernetes centers on network concepts Our job is to make sure your applications can communicate: • With each other • With the world outside your cluster • Only where you want
  2. It’s easy to get overwhelmed Many people are comfortable with

    TCP/IP, but containers bring new concepts: • Namespaces • Virtual interfaces • IP forwarding • Underlays • Overlays • iptables • NAT It’s enough to make your head spin
  3. Kubernetes is a very API-centric system - everything communicates through

    the API • No private APIs • No “system only” calls REST: Defined in terms of “resources” (nouns, aka “objects”) and methods (verbs) Background: API server
  4. A small group of tightly-coupled containers & volumes, composed together

    The atom of Kubernetes Shared lifecycle and fate Shared networking - a shared “real” IP, containers see each other as localhost Background: Pods
  5. A piece of code that watches the Kubernetes API and

    reacts The defining pattern of Kubernetes, used everywhere Self-healing, aka rectification Examples: ReplicaSet, Services, DNS, Kubelet Background: Controllers
  6. Background: Labels Metadata (key-value) which can be attached to any

    API resource Labels: identification • Allow users to define how to group resources • Examples: app name, tier (frontend/backend), stage (dev/test/prod) Annotations: data that “rides along” with objects • Third-party or internal state that isn’t part of an object’s schema role: fe stage: prod app: store
  7. Background: Selectors Expresses which objects to act upon • Think

    “select ... where” Provides very loose coupling Users can manage groups however they need Examples: services, deployments
  8. Background: Selectors role: fe stage: test role: be stage: test

    role: fe stage: prod role: be stage: prod app: store app: store app: store app: store
  9. Background: Selectors role: fe stage: test role: be stage: test

    role: fe stage: prod role: be stage: prod app: store app: store app: store app: store app=store, role=fe
  10. Background: Selectors role: fe stage: test role: be stage: test

    role: fe stage: prod role: be stage: prod app: store app: store app: store app: store app=store, stage=test
  11. Background: Selectors role: fe stage: test role: be stage: test

    role: fe stage: prod role: be stage: prod app: store app: store app: store app: store app=store
  12. Every pod has a real IP address This is different

    from the out-of-the-box model Docker offers • No machine-private IPs • No port-mapping Pod IPs are accessible from other pods, regardless of which VM they are on Linux “network namespaces” (aka “netns”) and virtual interfaces
  13. VM Life of a packet: pod-to-pod, same node root netns

    eth0 vethxx vethyy cbr0 pod1 netns pod2 netns eth0 eth0
  14. VM Life of a packet: pod-to-pod, same node root netns

    pod1 netns pod2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 eth0
  15. VM Life of a packet: pod-to-pod, same node root netns

    eth0 ctr1 netns ctr2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 pod1 netns pod2 netns eth0 eth0
  16. VM Life of a packet: pod-to-pod, same node root netns

    eth0 ctr1 netns ctr2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 pod1 netns pod2 netns eth0 eth0
  17. VM Life of a packet: pod-to-pod, same node root netns

    eth0 ctr1 netns ctr2 netns eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod2 pod1 netns pod2 netns eth0 eth0
  18. Flat network space Pods must be reachable across VMs, too

    Kubernetes doesn’t care HOW, but this is a requirement • L2, L3, or overlay Assign a CIDR (IP block) to each VM GCP: Teach the network how to route packets
  19. VM1 Life of a packet: pod-to-pod, across nodes root eth0

    vethxx vethyy cbr0 VM2 root eth0 vethxx vethyy cbr0 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  20. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0 src: pod1 dst: pod4
  21. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  22. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  23. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  24. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 Anti-spoofing: only allow known source IPs (i.e. VMs) pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  25. Programming GCP’s network GKE automatically sets up routing for you

    using every trick it needs All VMs are created as “routers” • --can-ip-forward • Disable anti-spoof protection for this VM Add one GCP static route for each VM • gcloud compute routes create vm2 --destination-range=x.y.z.0/24 --next-hop-instance=vm2 The GCP network does the rest
  26. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  27. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  28. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  29. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 src: pod1 dst: pod4 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0
  30. VM1 Life of a packet: pod-to-pod, across nodes ctr1 eth0

    ctr2 eth0 root eth0 vethxx vethyy cbr0 VM2 root eth0 ctr3 ctr4 eth0 eth0 vethxx vethyy cbr0 pod1 eth0 pod2 eth0 pod3 pod4 eth0 eth0 src: pod1 dst: pod4
  31. Dealing with change You need something more durable than a

    pod IP A real cluster changes over time: • Scale-up and scale-down events • Rolling updates • Pods crash or hang • VMs reboot The pod addresses you want to talk to can change without warning
  32. The service abstraction A service is a group of endpoints

    (usually pods) Services provide a stable VIP VIP automatically routes to backend pods • Implementations can vary • We will examine the default implementation The set of pods “behind” a service can change Clients only need the VIP, which doesn’t change
  33. Service What you submit is simple • Other fields will

    be defaulted or assigned kind: Service apiVersion: v1 metadata: name: store-be spec: selector: app: store role: be ports: - name: http port: 80
  34. Service What you submit is simple • Other fields will

    be defaulted or assigned The ‘selector’ field chooses which pods to balance across kind: Service apiVersion: v1 metadata: name: store-be spec: selector: app: store role: be ports: - name: http port: 80
  35. Service What you get back has more information Automatically creates

    a distributed load balancer kind: Service apiVersion: v1 metadata: name: store-be namespace: default creationTimestamp: 2016-05-06T19:16:56Z resourceVersion: "7" selfLink: /api/v1/namespaces/default/services/store-be uid: 196d5751-13bf-11e6-9353-42010a800fe3 Spec: type: ClusterIP selector: app: store role: be clusterIP: 10.9.3.76 ports: - name: http protocol: TCP port: 80 targetPort: 80 sessionAffinity: None
  36. Service What you get back has more information Automatically creates

    a distributed load balancer The default is to allocate an in-cluster IP kind: Service apiVersion: v1 metadata: name: store-be namespace: default creationTimestamp: 2016-05-06T19:16:56Z resourceVersion: "7" selfLink: /api/v1/namespaces/default/services/store-be uid: 196d5751-13bf-11e6-9353-42010a800fe3 Spec: type: ClusterIP selector: app: store role: be clusterIP: 10.9.3.76 ports: - name: http protocol: TCP port: 80 targetPort: 80 sessionAffinity: None
  37. Endpoints selector: app: store role: be app: store role: be

    10.11.8.67 app: store role: be 10.11.5.3 app: store role: be 10.11.0.9 app: db role: be 10.7.1.18 app: store role: fe 10.11.8.67 app: db role: be 10.4.1.11
  38. Endpoints selector: app: store role: be app: store role: be

    10.11.8.67 app: store role: be 10.11.5.3 app: store role: be 10.11.0.9 app: db role: be 10.7.1.18 app: store role: fe 10.11.8.67 app: db role: be 10.4.1.11
  39. Endpoints When you create a service, a controller wakes up

    kind: Endpoints apiVersion: v1 metadata: name: store-be namespace: default subsets: - addresses: - ip: 10.11.8.67 - ip: 10.11.5.3 - ip: 10.11.0.9 ports: - name: http port: 80 protocol: TCP
  40. Endpoints When you create a service, a controller wakes up

    Holds the IPs of the pod backends kind: Endpoints apiVersion: v1 metadata: name: store-be namespace: default subsets: - addresses: - ip: 10.11.8.67 - ip: 10.11.5.3 - ip: 10.11.0.9 ports: - name: http port: 80 protocol: TCP
  41. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 src: pod1 dst: svc1 pod1 netns eth0
  42. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 src: pod1 dst: svc1 pod1 netns eth0
  43. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 src: pod1 dst: svc1 iptables pod1 netns eth0
  44. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: pod1 dst: svc1 dst: pod99 DNAT, conntrack pod1 netns eth0
  45. Conntrack Linux kernel connection-tracking Remembers address translations • Based on

    the 5-tuple Does a lot more, but not very relevant here Reversed on the return path { protocol = TCP src_ip = pod1 src_port = 1234 dst_ip = svc1 dst_port = 80 } => { protocol = TCP src_ip = pod1 src_port = 1234 dst_ip = pod99 dst_port = 80 }
  46. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: pod1 dst: pod99 pod1 netns eth0
  47. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: pod99 dst: pod1 pod1 netns eth0
  48. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: pod99 src: svc1 dst: pod1 un-DNAT pod1 netns eth0
  49. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: svc1 dst: pod1 pod1 netns eth0
  50. Life of a packet: pod-to-service root netns eth0 ctr1 netns

    eth0 vethxx cbr0 iptables src: svc1 dst: pod1 pod1 netns eth0
  51. The iptables rules look scary, but are actually simple: Configured

    by ‘kube-proxy’ - a pod running on each VM • Not actually a proxy • Not in the data path Kube-proxy is a controller - it watches the API for services if dest.ip == svc1.ip && dest.port == svc1.port { pick one of the backends at random rewrite destination IP } A bit more on iptables
  52. DNS Even easier: services are added to an in-cluster DNS

    server You would never hardcode an IP, but you might hardcode a hostname and port Serves “A” and “SRV” records DNS itself runs as pods and a service
  53. DNS Service Requests a particular cluster IP Pods are auto-scaled

    with the cluster size Service VIP is stable kind: Service apiVersion: v1 metadata: name: kube-dns namespace: kube-system spec: clusterIP: 10.0.0.10 selector: k8s-app: kube-dns ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP
  54. DNS Service Requests a particular cluster IP Pods are auto-scaled

    with the cluster size Service VIP is stable kind: Service apiVersion: v1 metadata: name: kube-dns namespace: kube-system spec: clusterIP: 10.0.0.10 selector: k8s-app: kube-dns ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP
  55. Simple and powerful Can use any port you want, no

    conflicts Can request a particular ‘clusterIP’ Can remap ports
  56. That’s all there is to it Services are an abstraction

    - the API is a VIP No running process or intercepting the data-path All a client needs to do is hit the service IP:port
  57. Leaving the GCP project VMs get private IPs (in 10.0.0.0/8)

    VMs can have public IPs, too GCP: Public IPs are provided by 1-to-1 NAT
  58. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: VM-internal dst: 8.8.8.8
  59. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: VM-internal src: VM-external dst: 8.8.8.8
  60. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: VM-external dst: 8.8.8.8
  61. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: 8.8.8.8 dst: VM-external
  62. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: 8.8.8.8 dst: VM-external dst: VM-internal
  63. GCP Project VM Life of a packet: VM-to-internet root netns

    eth0 1:1 NAT src: 8.8.8.8 dst: VM-internal
  64. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT
  65. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT src: pod1 dst: 8.8.8.8
  66. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT src: pod1 dst: 8.8.8.8
  67. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT src: pod1 dst: 8.8.8.8
  68. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 1:1 NAT dropped!
  69. What went wrong? The 1:1 NAT only understands VM IPs

    • Anything else gets dropped Pod IPs != VM IPs When in doubt, add some more iptables • MASQUERADE, aka SNAT Applies to any packet with a destination *outside* of 10.0.0.0/8
  70. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: pod1 src: VM-internal dst: 8.8.8.8 MASQUERADE
  71. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: VM-internal dst: 8.8.8.8
  72. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: VM-internal src: VM-external dst: 8.8.8.8
  73. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: VM-external dst: 8.8.8.8
  74. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-external
  75. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-external dst: VM-internal
  76. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-internal
  77. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: VM-internal dst: pod1
  78. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: pod1
  79. GCP Project VM Life of a packet: pod-to-internet pod1 netns

    eth0 root netns eth0 vethxx cbr0 iptables 1:1 NAT src: 8.8.8.8 dst: pod1
  80. Receiving external traffic GCP offers multiple products here Kubernetes builds

    on two: • Network Load Balancer (L4) • HTTP/S Load balancer (L7) These map to Kubernetes APIs: • Service type=LoadBalancer • Ingress
  81. Service Change the type of your service Implemented by the

    cloud provider controller kind: Service apiVersion: v1 metadata: name: store-be spec: type: LoadBalancer selector: app: store role: be ports: - name: https port: 443
  82. Service The LB info is populated when ready kind: Service

    apiVersion: v1 metadata: name: store-be # ... spec: type: LoadBalancer selector: app: store role: be clusterIP: 10.9.3.76 ports: # ... sessionAffinity: None status: loadBalancer: ingress: - ip: 86.75.30.9
  83. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: client dst: LB pod1 pod2 pod3
  84. GCP Project VM1 VM1 VM1 VM1 Life of a packet:

    external-to-service Net LB VM2 VM3 src: client dst: LB Choose a VM pod1 pod2 pod3
  85. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 src: client dst: LB Choose a VM pod1 pod2 pod3
  86. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 Rejected by firewall GKE runs: gcloud firewalls create ... pod1 pod2 pod3
  87. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 src: client dst: LB pod1 pod2 pod3
  88. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 src: client dst: LB pod1 pod2 pod3
  89. Balancing to VMs The LB only knows about VMS VMs

    do not map 1:1 with pods VM1 VM2 VM3
  90. GCP Project VM1 The imbalance problem Net LB VM2 VM3

    pod1 pod2 pod3 Assume the LB only hits VMs with pods The LB only knows about VMS
  91. GCP Project VM1 50% The imbalance problem Net LB VM2

    VM3 50% 50% 25% 25% pod1 pod2 pod3
  92. Balancing to VMs The LB only knows about VMS VMs

    do not map 1:1 with pods How do we avoid imbalance? iptables, of course VM1 VM2 VM3
  93. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB Choose a pod pod1 pod2 pod3
  94. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB Choose a pod pod1 pod2 pod3
  95. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB dst: pod2 NAT pod1 pod2 pod3
  96. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  97. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  98. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  99. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  100. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: client pod1 pod2 pod3
  101. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: client pod1 pod2 pod3
  102. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: client INVALID pod1 pod2 pod3
  103. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client src: VM1 dst: LB dst: pod2 NAT pod1 pod2 pod3
  104. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3
  105. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3
  106. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3
  107. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: VM1 dst: pod2 pod1 pod2 pod3
  108. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: VM1 pod1 pod2 pod3
  109. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: VM1 pod1 pod2 pod3
  110. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 dst: VM1 pod1 pod2 pod3
  111. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 src: LB dst: VM1 dst: client pod1 pod2 pod3
  112. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: LB dst: client pod1 pod2 pod3
  113. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: LB dst: client pod1 pod2 pod3
  114. Explain the complexity To avoid imbalance, we re-balance inside Kubernetes

    A backend is chosen randomly from all pods Good: • Well balanced, in practice Bad: • Can cause an extra network hop • Hides the client IP from the user’s backend Users wanted to make the trade-off themselves
  115. OnlyLocal Specify an external-traffic policy iptables will always choose a

    pod on the same node Preserves client IP Risks imbalance kind: Service apiVersion: v1 metadata: name: store-be annotations: service.beta.kubernetes.io/external-traffic: OnlyLocal spec: type: LoadBalancer selector: app: store role: be ports: - name: https port: 443
  116. GCP Project VM1 50% Opt-in to the imbalance problem Net

    LB VM2 VM3 25% 25% iptables iptables In practice Kubernetes spreads pods across nodes If pods >> nodes: OK If nodes >> pods: OK If pods ~= nodes: risk pod1 pod2 pod3 50% 50%
  117. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 Not considered Health-check fails if no backends pod1 pod2 pod3
  118. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: client dst: LB pod1 pod2 pod3
  119. GCP Project VM1 VM1 VM1 Life of a packet: external-to-service

    Net LB VM2 VM3 src: client dst: LB Choose a VM pod1 pod2 pod3
  120. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM1 VM3 src: client dst: LB pod1 pod2 pod3
  121. GCP Project VM1 VM1 Life of a packet: external-to-service Net

    LB VM2 VM3 src: client dst: LB pod1 pod2 pod3
  122. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB Choose a pod pod1 pod2 pod3
  123. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: LB dst: pod2 DNAT pod1 pod2 pod3
  124. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: client dst: pod2 pod1 pod2 pod3
  125. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: pod2 src: LB dst: client pod1 pod2 pod3
  126. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 iptables src: LB dst: client pod1 pod2 pod3
  127. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: LB dst: client pod1 pod2 pod3
  128. GCP Project VM1 Life of a packet: external-to-service Net LB

    VM2 VM3 src: LB dst: client pod1 pod2 pod3
  129. Service Change the type of your service Allocates and forwards

    a port on every VM to the service port Exactly the same data path as the LB case kind: Service apiVersion: v1 metadata: name: store-be spec: type: NodePort selector: app: store role: be ports: - name: https port: 443
  130. Ingress A different API resource Maps HTTP to services Implemented

    by the cloud provider controller kind: Ingress apiVersion: extensions/v1beta1 metadata: name: store-ing spec: rules: - http: paths: - path: /customers backend: serviceName: customers-be - path: /products backend: serviceName: products-be
  131. Ingress A different API resource Maps HTTP to services Implemented

    by the cloud provider controller kind: Ingress apiVersion: extensions/v1beta1 metadata: name: store-ing spec: rules: - http: paths: - path: /customers backend: serviceName: customers-be - path: /products backend: serviceName: products-be
  132. Ingress The LB info is populated when ready kind: Ingress

    apiVersion: extensions/v1beta1 metadata: name: store-ing spec: rules: - http: paths: - path: /customers backend: serviceName: customers-be - path: /products backend: serviceName: products-be status: loadBalancer: ingress: - ip: 86.73.50.9
  133. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 src: client dst: LB path: /products pod1 pod4 pod5 pod2 pod3
  134. GCP Project VM1 VM1 VM1 VM1 Life of a packet:

    external-to-ingress GCLB VM2 VM3 src: client dst: LB path: /products Choose a VM pod1 pod4 pod5 pod2 pod3
  135. GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB

    VM2 VM3 src: client dst: LB path: /products Choose a VM pod1 pod4 pod5 pod2 pod3
  136. GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB

    VM2 VM3 src: GCLB dst: VM3 pod1 pod4 pod5 pod2 pod3
  137. GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB

    VM2 VM3 src: GCLB dst: VM3 pod1 pod4 pod5 pod2 pod3
  138. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: GCLB dst: VM3 Choose a pod pod1 pod4 pod5 pod2 pod3
  139. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: GCLB dst: VM3 Choose a pod pod1 pod4 pod5 pod2 pod3
  140. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: GCLB src: VM3 dst: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  141. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  142. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  143. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  144. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: pod3 pod1 pod4 pod5 pod2 pod3
  145. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3
  146. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3
  147. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3
  148. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 dst: VM3 pod1 pod4 pod5 pod2 pod3
  149. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: pod3 src: VM3 dst: VM3 dst: GCLB pod1 pod4 pod5 pod2 pod3
  150. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 iptables src: VM3 dst: GCLB pod1 pod4 pod5 pod2 pod3
  151. GCP Project VM1 VM1 Life of a packet: external-to-ingress GCLB

    VM2 VM3 src: VM3 dst: GCLB pod1 pod4 pod5 pod2 pod3
  152. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 src: VM3 dst: client pod1 pod4 pod5 pod2 pod3
  153. GCP Project VM1 Life of a packet: external-to-ingress GCLB VM2

    VM3 src: LB dst: client pod1 pod4 pod5 pod2 pod3
  154. OnlyLocal The same annotation as before Configured per-service iptables will

    always choose a pod on the same node Risks imbalance Removes 2nd hop kind: Service apiVersion: v1 metadata: name: store-be annotations: service.beta.kubernetes.io/external-traffic: OnlyLocal spec: type: NodePort selector: app: store role: be ports: - name: https port: 443
  155. But wait, there’s more! Things we didn’t really cover: •

    Pod liveness probes • Graceful termination • Cloud health checks • Firewalls • Headless services • IPAM • SSL • ...
  156. Google Container Engine is a moving target The efforts of

    Open Source developers and Google Engineers continue to improve and simplify the system Google NEXT ‘18 will have more “ins” and “outs” for network traffic Watch this space