Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SLAC2024: 1001 Ways to Shoot Yourself in the Fo...

SLAC2024: 1001 Ways to Shoot Yourself in the Foot with On-Prem Kubernetes

Containers and associated orchestration platforms such as Docker and Kubernetes have become an integral part of the modern world of IT operations - provided you are running in the public cloud. But the further you move away from the public cloud, the more you enter a world of uncertainty and pain. With a bare metal setup, you are often trapped in a thicket of provisioning and deployment, storage and networking. In this talk, I share the lessons we learned while building our own on-premise Kubernetes platform - including all the pitfalls we stumbled upon and how to successfully run Kubernetes on your own metal.

Martin Helmich

May 06, 2024
Tweet

More Decks by Martin Helmich

Other Decks in Technology

Transcript

  1. May 6th, 2024 MARTIN HELMICH @mittwald 1001 WAYS to SHOOT

    YOURSELF IN THE FOOT with ON - PREM KUBERNETES CC - BY Anja Vrečko
  2. MARTIN HELMICH Head of Architecture & 
 Developer Relations Lecturer,

    Software Engineering & Cloud Computing Sci-Fi-Nerd, Metalhead, 
 Amateur Woodworker
  3. PREVIOUSLY ON MARTINS ADVENTURES WITH KUBERNETES SKIP RECAP ( SLAC

    2018 ) 
 https://speakerdeck.com/martinhelmich/slac18-php-in-the-container-cloud
  4. OUR ILEAGE AY ARY Y M M V CAUTION :

    This presentation summarizes our OWN LEARNINGS from running Kubernetes on-premise. Not all learnings might apply to your own use case.
  5. SOURCE OPEN CAUTION : All learnings in this presentation were

    learned from using OPEN SOURCE components ONLY (because that is what we prefer). Commercial alternatives may exist.
  6. ETCD API SERVER CONTROLLER MANAGER SCHEDULER KUBE PROXY KUBELET POD

    POD POD POD POD CONTOL PLANE NODE BASIC KUBERNETES ARCHITECTURE
  7. > ./kube-apiserver \ 
 --bind-address=0.0.0.0 \ 
 --apiserver-count=3 \ 


    --authorization-mode=Node,RBAC \ 
 --etcd-servers=http://10.0.0.1:2379,http://10.0.0.2:2379,http//... 
 --[one million more flags] 
 
 > ./kube-scheduler --config=/etc/kubernetes/config/kube-scheduler.yaml 
 > ./kube-controller-manager \ 
 --kubeconfig=/var/lib/kubernetes/kube-controller-manager.kubeconfig 
 --[even more flags]
  8. NETWORKING CLUSTER 
 AUTOSCALING ETCD “THE CONTROL PLANE IS NOT

    THAT COMPLICATED” “IT’S ALL JUST YAML FILES”
  9. KUBE PROXY POD POD POD POD POD NODE POD POD

    POD POD POD NODE OUTSIDE WORLD INTER - POD INTER - NODE KUBE PROXY INGRESS
  10. POD POD INTER - POD MANUAL ROUTING CALICO CILIUM FLANNEL

    WEAVE AWS VPC AZURE CNI RECOMMENDATION : Choose a CNI provider that fits your use case. Apart from the cloud- provider specific solutions (like AWS VPC ) , these all usually work both on-premise and in the cloud. We went with Calico, and consider migrating to Cilium in the future.
  11. KUBE PROXY POD POD POD POD POD NODE POD POD

    POD POD POD NODE OUTSIDE WORLD INTER - POD INTER - NODE KUBE PROXY INGRESS
  12. POD OUTSIDE WORLD KUBE PROXY INGRESS ??? A LOAD BALANCER

    service provides a publicly routed IP address to a SERVICE object
  13. apiVersion: v1 kind: Service metadata: name: example-service spec: type: LoadBalancer

    selector: app: example ports: - port: 80 targetPort: 8080 What ACTUALLY HAPPENS when you create a LoadBalancer service? 🤔
  14. WAIT, IT’S ALL JUST IPTABLES? ALWAYS HAS BEEN RECOMMENDATION :

    Use IPVS (instead of iptables) in large clusters. Read more
  15. ETCD API SERVER CONTROLLER MANAGER SCHEDULER KUBE PROXY KUBELET POD

    POD POD POD POD CONTOL PLANE NODE BASIC KUBERNETES ARCHITECTURE
  16. ETCD API SERVER CONTROLLER MANAGER SCHEDULER CONTOL PLANE BASIC KUBERNETES

    ARCHITECTURE CLOUD 
 CONTROLLER 
 MANAGER CLOUD PROVIDER CLUSTER EXTERNAL NETWORKING RESOURCES Read more
  17. RECOMMENDATION : In the public cloud, it’s often easier to

    limit yourself to stateless deployments, and use database resources offered by the cloud provider (like AWS RDS ) . On-prem, you need to deploy your stateful workloads SOMEWHERE and need to think about providing STORAGE for these workloads. EXAMPLES : - Databases - User uploads ( TYPO3 fileadmin/, Wordpress wp-content/) - Web Hosting use-case: Entire applications that are deployed via direct upload
  18. ETCD API SERVER CONTROLLER MANAGER SCHEDULER CONTOL PLANE BASIC KUBERNETES

    ARCHITECTURE CLOUD PROVIDER CLOUD 
 CONTROLLER 
 MANAGER
  19. ETCD API SERVER CONTROLLER MANAGER SCHEDULER CONTOL PLANE BASIC KUBERNETES

    ARCHITECTURE CLOUD PROVIDER CLOUD 
 CONTROLLER 
 MANAGER CSI DRIVER Read more
  20. eloneo https://pixabay.com/photos/hard-drive- technology-computers-1265259/ MIND YOUR WORKLOAD RECOMMENDATION : Consider the

    actual performance and access mode requirements of your application workload. Cloud-native solutions (e.g. using object storage instead of a shared file system) are preferable, but require support by the application.
  21. etcd on networked block storage etcd on local SSD RECOMMENDATION

    : Storage performance quickly becomes a bottleneck for ETCD in larger clusters. Use local storage whenever possible.
  22. Gardener ORGANIZATIONAL BOUNDARY BARE METAL SERVERS TRADE - OFF :

    Vertical organisational boundaries through your tech stack may be necessary to manage complexity, but increase communication overhead (which is annoying in failure scenarios and may increase TTR )
  23. Ralph_PH, CC - BY K 
 I 
 S 


    S EEP 
 T 
 IMPLE, 
 TUPID.
  24. BARE METAL SERVERS RECOMMENDATION : ( YMMV ) Eliminate components

    in your stack that you do not actually need for offering your service. In our case, we decided to ultimately move away from OpenStack and complex cluster provisioning solutions like Gardener.
  25. BARE METAL SERVERS Metallb REMINDER : YMMV OUR APPROACH :

    KUBERNETES - NATIVE EVERYTHING STORAGE VIRTUALIZATION NETWORKING metal-stack RECOMMENDATION : Keep to a single ecosystem (in our case: Kubernetes) to keep complexity manageable. RECOMMENDATION : Established, but battle- proven tools (looking at you, kubeadm) may be sufficient to get the job done.
  26. OUTLOOK https://metal-stack.io/ https://github.com/onmetal There are a few projects (all in

    early states of maturity) that also try to automate deploying Kubernetes on bare metal.
  27. ACCEPT COMPLEXITY WHERE YOU NEED IT ( AND MANAGE ACCORDINGLY

    ) KEEP IT SIMPLE WHERE YOU DON'T KNOW YOUR PRODUCT