Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HPC Saudi 2022 - Towards Cloud Native HPC

Walid
September 28, 2022

HPC Saudi 2022 - Towards Cloud Native HPC

Inspired by the cloud native community and CNCF Research end-users such as CERN, University of Michigan and many others. With our small contribution, Nora Alwadah and I extended the bridge to the Saudi HPC community.

Key takeaway: Follow and join the new Kubernetes Batch Working Group. Help them nourish and evolve.

Walid

September 28, 2022
Tweet

More Decks by Walid

Other Decks in Technology

Transcript

  1. Non-Business Use Adoption of Public Clouds in HPC sites 13%

    74% 2011 2018 Hyperion Research study “Cloud Computing Comes of Age”, 2019
  2. 5 Non-Business Use Architectural design that breaks an application to

    independent, loosely-coupled, individually deployable services. • Portability was a challenge. Orchestration Containers Microservices
  3. 6 Non-Business Use Bundling of an application and all its

    dependencies as a package to be deployed regardless of environment. Orchestration Containers Microservices
  4. 7 Non-Business Use Automation of the operational effort required to

    run the lifecycle of a container; its workloads and services . • provisioning, deployment, scaling (up and down), networking, load balancing and more. • Enabling DevOps and CI/CD Orchestration Containers Microservices
  5. 10 Non-Business Use Google & Linux Foundation Project Founded in

    2015 Advance Container Technology App Definition & Development Database, Streaming & Messaging, App Def & Image building, CICD Orchestration & Management Scheduling & Orchestration, Coordination & Service Discovery, Remote Procedure Call, Service Proxy, API Gateway, Service Mesh Runtime Cloud Native Storage, Container Runtime, Cloud Native Network Provisioning Automation & Configuration, Container Registry, Security & Compliance, Key Management Special Kubernetes Certified Service Provider, Kubernetes Training Partner, Platform Certified Kubernetes Distribution, Host, Installer Observability & Analysis Monitoring, Logging, Tracing, Chaos Engineering, Continuous Optimization Serverless
  6. 11 Non-Business Use Google & Linux Foundation Project Founded in

    2015 Advance Container Technology App Definition & Development Database, Streaming & Messaging, App Def & Image building, CICD Orchestration & Management Scheduling & Orchestration, Coordination & Service Discovery, Remote Procedure Call, Service Proxy, API Gateway, Service Mesh Runtime Cloud Native Storage, Container Runtime, Cloud Native Network Provisioning Automation & Configuration, Container Registry, Security & Compliance, Key Management Special Kubernetes Certified Service Provider, Kubernetes Training Partner, Platform Certified Kubernetes Distribution, Host, Installer Observability & Analysis Monitoring, Logging, Tracing, Chaos Engineering, Continuous Optimization Serverless Scheduling Observability Storage Network UX High Performance Computing
  7. Cloud Native Distributed Cloud Kubernetes CNCF launched v1.0 GA Huawei

    Cloud Container Engine (CCE) Google Kubernetes Engine (GKE) KubeEdge CNCF’s first intelligent edge computing project Volcano CNCF’s first batch scheduling project Distributed Cloud Native Slurmnetes Batch scheduling failed attempts KubeFlow Machine learning framework for operations, pipelines, training & deployment. MindSpore Deep Learning framework for mobile, edge, cloud scenarios Karmada CNCF’s first multi-cloud container orchestration project Evolution Timeline Kueue Kubernetes-native job queueing Cern 1000 node POC 2015 2016 2019 2020 2021 2017 2018 2022 2011 Cycle Computing Running cloud HPC around 8 regions Expanded upon chart from https://bit.ly/FrontiersCloudNative
  8. HPC Cloud Adoption Challenges Special Hardware Data Gravity Paradigm Shift

    • Network latency, as in special IB • GPUS, accelerators, Numa …etc • CPU architecture and topology TOP 500
  9. HPC Cloud Adoption Challenges Special Hardware Data Gravity Paradigm Shift

    • Data governance • Data residency • Egress cost • Higher the availability, higher the cost Services Data Apps Throughput Latency
  10. HPC Cloud Adoption Challenges Special Hardware Persistent Storage Kubernetes Control

    Plane K8s Kubelet K8s Kubelet K8s Kubelet Image Registry Data Gravity Paradigm Shift • Both, learning and adoption • Distributing workload as images (registry)
  11. Research End User: CERN https://bit.ly/HPCSAUDI-cern-org CERN is the European Organization

    for Nuclear Research. • Kubernetes use case: Particle Physics • Experimented with virtualization early to enable ease of management and automation. 2017 first Kubernetes POC 1000 worker nodes Data 330 PB Hybrid on-demand infra 3hrs > 15 min
  12. Public Cloud Use Cases “Focus on your application and results”

    • Dynamically provision resources • Plans, schedules, and executes • Fully managed “Serverless” • Free • Integration with AWS services 2020 Statistics Largest Cluster 1,243,000 vCPUS Largest Container Image 30 GB No. simulatenous jobs 500,000 Customers Thousands 1000s
  13. The CNCF Community It's very hard right now to justify

    developing a new product in-house. There is really no real reason to keep doing that. It's much easier for us to try it out, and if we see it's a good solution, we try to reach out to the community and start working with that community.”
  14. Where to next? • Kubernetes Batch HPC Day North America

    2022 • SC22 Containers and New Orchestration Paradigms for Isolated Environments in HPC • CNCF Research User Group • CNCF Technical Advisory Group for Runtime • Kubernetes Community: Batch WorkGroup • CNCF Batch System Initiative Working Group