Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Kubernetes - Boston DevOps Meetup

Scaling Kubernetes - Boston DevOps Meetup

Myles Steinhauser

April 24, 2019
Tweet

More Decks by Myles Steinhauser

Other Decks in Technology

Transcript

  1. S C A L I N G K U B

    E R N E T E S F O R T H E B O S E C L O U D P L A T F O R M B o s t o n D e v O p s M e e t u p A p r i l 2 4 t h , 2 0 1 9 1
  2. 3 o I help businesses scale by narrating their strategy

    and the distributed systems they need to succeed. o Senior Cloud Engineer @ Bose o I help build and operate an IoT platform for cloud-connected speakers o Team of 1 = Gen 1 – Kube 1.5 o Team of 3 = Gen 2 – Kube 1.8 o Team of 6 = Gen 3 – Kube 1.15 (next) Hi, I’m Myles. @masteinhauser
  3. “ E V E R Y T H I N

    G T H A T C A N B E I N V E N T E D H A S B E E N I N V E N T E D . ” C H A R L E S H . D U E L L , C O M M I S S I O N E R , U . S . P A T E N T O F F I C E , 1 8 9 9 4
  4. A U T O M O T I V E

    W H Y D O E S B O S E E V E N N E E D A C L O U D A N Y W A Y ? 5
  5. 6

  6. The Bose Kubernetes Platform 7 o Custom install, through and

    through o Terraform o Ansible o Helm (+ static manifests) o Custom Kubernetes Binaries o Kube 1.8.15 + custom CVE backports, Calico v2.6.8 o 3 Kube Masters (API, Controller Manager, Scheduler) o etcd-main is colocated on the masters (BAD!) - m4.4xlarge o etcd-events 6 nodes, over 3 AZs - m4.4xlarge o Running on AWS EC2 o US-EAST1 (Virginia)
  7. 8

  8. Current Cluster Stats 10 • 465 Nodes, across many instance

    types each in "placement groups" for workload scheduling and segmentation • > 6,000 CPU Cores • > 26TB of RAM • > 6,000 Pods (13 pods/node, deeply skewed on the low end) • ~ 2,000 Services • ~ 2,000 Namespaces Yes, we run databases on Kubernetes. Dozens of them! https://kubernetes.io/docs/concepts/configuration/t aint-and-toleration/
  9. Scaling Kubernetes 11 • I CANNOT POSSIBLY COVER EVERYTHING. Let's

    all chat afterwards, instead. • Surprisingly possible! • The defaults will get you far farther than you think • Design with the defaults in mind (more smaller clusters are "safer")
  10. Technical Solutions won't fix your Organizational Problems 12 • The

    worst problem you will have is being successful... on a single cluster. • Strategic technical debt so the business could hit its deliverable (Late-Summer, 2018). • Technical debt + Organizational Fear we missed when we (as engineers) made those decisions. • Technical debt isn’t bad, but important for engineers to understand the greater environment for their daily decision making! • "Take on mortgages, not payday loans."
  11. Technical Solutions won't fix your Organizational Problems 13 • If

    you don't plan for multi-cluster from early on (perhaps not Day 1, but Day 30?) you might never get there. • Updates and upgrades, of anything in the system, should be common place and accepted. • Determine what level of failure is acceptable to the business as something is always raining in the cloud.
  12. A messy Cluster is a noisy Pager 14 • Rates-of-churn

    are the most important impact factor for the stability of the cluster. • New Deployments • Pods in CrashLoopBackoff • Nodes thrashing due to OOMKiller or shared disk and network I/O • Namespaces per-service, cluster per-environment per-region is likely an acceptable trade-off. • Programmatic RBAC and PodSecurityPolicy management is crucial to secure clusters (this has saved us from multiple CVEs) • https://github.com/reactiveops/rbac-manager • Every app that integrates with the Kube API gets a separate ServiceAccount, and disabling them by default is a big win. • We have not yet, but are thinking of deploying a proxy in front of Kube API for rate-limiting, etc.
  13. Lets talk about "Defaults" 15 At v1.14, Kubernetes supports clusters

    with up to 5000 nodes. More specifically, we support configurations that meet all of the following criteria: No more than 5000 nodes No more than 150000 total pods No more than 300000 total containers No more than 100 pods per node You cannot do all of these all at once, it's a multi- dimensional constraint optimization problem. *** None of this includes or talks about the Rate-of- Churn problem! *** https://kubernetes.io/docs/setup/cluster-large/ http://www.tshirtvortex.net/charles-darwin-riding-a-tortoise/
  14. Kubernetes Scalability Continuum 16 "Kubernetes Scalability: A Multi- Dimensional Analysis

    - Maciek Różacki, Google & Shyam Jeedigunta" https://kccna18.sched.com/event/GrXy https://youtu.be/t_Ww6ELKl4
  15. Let's talk about Bottlenecks 17 Welcome to the Rodeo. Trust

    me, you’re going to need all the metrics and dashboards you can get. This is when all that Anomaly Detection comes in handy. https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/
  16. Let's talk about Bottlenecks 18 etcd: • Writes are full-quorum

    latency bound: slowest node = (network + disk) • Gets hit for every write /and/ every read with a full quorum • Multiple Watches get reduced down to 1, if they have the same arguments. • Higher redundancy explicitly reduces performance! https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/
  17. Let's talk about Bottlenecks 19 kube-apiserver: • Easy to scale

    out, especially independently of etcd servers when they're not collocated! • --target-ram-mb • The biggest lever available to you • --max-requests-inflight • If you have a significant amount of Kube API integration, dedicated API servers for the Kube Control Plane and your integration is very helpful. kube-controller-manager: • Oh so many flags that can be tuned. https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/
  18. Let's talk about Bottlenecks 20 kube-apiserver: • Easy to scale

    out, especially independently of etcd servers when they're not collocated! • --target-ram-mb • The biggest lever available to you • --max-requests-inflight • If you have a significant amount of Kube API integration, dedicated API servers for the Kube Control Plane and your integration is very helpful. kube-controller-manager: • Oh so many flags that can be tuned. https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/
  19. Let's talk about Bottlenecks 21 kube-proxy: "It couldn't possibly be

    a DNS problem." • Transparent load balancer running on every Kube Worker to assist in Service Discovery and East-West request handling. • Helpful until it breaks in dark and mysterious ways • conntrack table failures • Delayed updates • Uncoordinated updates • Iptables locking with CNI plugin https://applatix.com/making-kubernetes-production-ready-part-2/ https://openai.com/blog/scaling-kubernetes-to-2500-nodes/
  20. A U T O M O T I V E

    A T B O S E , W E ’ R E O B S E S S E D W I T H P E R F O R M A N C E O N W H A T M A T T E R S M O S T : T H E L I T T L E D E T A I L S T H A T M A K E A B I G D I F F E R E N C E A N D T H E B I G D E T A I L S T H A T A S T O N I S H . 22
  21. BUTWHATABOUT 23 Kubernetes overall is: • largely indifferent from most

    traditional application stacks and data planes. • load balancers like HAProxy or Traefik still have their own non-Kube performance knowledge. • Applications within Pods scale and behave largely the same as a traditional system. • Except for Java… sometimes
  22. Takeaways 24 • PLEASE, whatever you decide don't build it

    yourself in 2018 and beyond. • kubeadm has come a drastically long way • Use a vendor when you can, GKE is what I recommend personally (even though we don't use it) • Shout-out to Sarah @ ReactiveOps • - Many, many vendors in this space right now • Kubernetes *will* scale better than you might expect... or want it to. • Remember to focus on the business differentiators, does operating and scaling Kubernetes actually make sense for your BUSINESS?
  23. 25