Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deploying multi-GPU workloads on Kubernetes in ...

Deploying multi-GPU workloads on Kubernetes in Python

By using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines. In this talk, we will discuss how to install and configure Dask on your Kubernetes cluster and use it to run accelerated GPU workloads on your cluster.

The RAPIDS suite of open-source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs with minimal code changes and no new tools to learn.

Dask is an open-source library which provides advanced parallelism for Python by breaking functions into a task graph that can be evaluated by a task scheduler that has many workers.

By using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines. In this talk, we will discuss how to install and configure Dask on your Kubernetes cluster and use it to run accelerated GPU workloads on your cluster.

Jacob Tomlinson

August 17, 2023
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. Deploying Multi-GPU workloads on Kubernetes in Python EuroSciPy - Aug

    2023 Jacob Tomlinson RAPIDS Cloud Deployment Lead NVIDIA
  2. 2 2

  3. 4 Minor Code Changes for Major Benefits Abstracting Accelerated Compute

    through Familiar Interfaces In [1]: import pandas as pd In [2]: df = pd.read_csv(‘filepath’) In [1]: from sklearn.ensemble import RandomForestClassifier In [2]: clf = RandomForestClassifier(n_estimators=10 0,max_depth=8, random_state=0) In [3]: clf.fit(x, y) In [1]: import networkx as nx In [2]: page_rank=nx.pagerank(graph) In [1]: import cudf In [2]: df = cudf.read_csv(‘filepath’) In [1]: from cuml.ensemble import RandomForestClassifier In [2]: cuclf = RandomForestClassifier(n_estimators=10 0,max_depth=8, random_state=0) In [3]: cuclf.fit(x, y) In [1]: import cugraph In [2]: page_rank=cugraph.pagerank(graph) GPU CPU pandas scikit-learn NetworkX cuDF cuML cuGraph Average Speed-Ups: 150x Average Speed-Ups: 250x Average Speed-Ups: 50x
  4. 5 Lightning-Fast End-to-End Performance Reducing Data Science Processes from Hours

    to Seconds *CPU approximate to n1-highmem-8 (8 vCPUs, 52GB memory) on Google Cloud Platform. TCO calculations-based on Cloud instance costs. A100s Provide More Power than 100 CPU Nodes 16 More Cost-Effective than Similar CPU Configuration 20x Faster Performance than Similar CPU Configuration 70x
  5. General purpose Python library for parallelism Scales existing libraries, like

    Numpy, Pandas, and Scikit-Learn Flexible enough to build complex and custom systems Accessible for beginners, secure and trusted for institutions Jacob Tomlinson Core Developer Dask
  6. Dask accelerates the existing Python ecosystem Built alongside with the

    current community import numpy as np x = np.ones((1000, 1000)) x + x.T - x.mean(axis=0 import pandas as pd df = pd.read_csv(“file.csv”) df.groupby(“x”).y.mean() from scikit_learn.linear_model \ import LogisticRegression lr = LogisticRegression() lr.fit(data, labels) Numpy Pandas Scikit-Learn
  7. 8 Pre-Processing pandas Data Preparation Visualization Model Training Machine Learning

    scikit-learn Graph Analytics NetworkX Deep Learning TensorFlow, PyTorch, MxNet Visualization matplotlib Apache Spark / Dask CPU Memory Open Source Software Has Democratized Data Science Highly Accessible, Easy to Use Tools Abstract Complexity
  8. 9 Accelerated Data Science with RAPIDS Powering Popular Data Science

    Ecosystems with NVIDIA GPUs Pre-Processing cuIO & cuDF Data Preparation Visualization Model Training Machine Learning cuML, XGBoost Graph Analytics cuGraph Deep Learning TensorFlow, PyTorch, MxNet Visualization cuXfilter, pyViz, Plotly Dask GPU Memory Spark / Dask
  9. 10 XGBoost + RAPIDS: Better Together • RAPIDS comes paired

    with XGBoost 1.6.0 • XGBoost provides zero-copy data import from cuDF, CuPy, Numba, PyTorch and more • Official Dask API makes it easy to scale to multiple nodes or multiple GPUs • GPU tree builder delivers huge perf gains • Now supports Learning to Rank, categorical variables, and SHAP Explainability • Use models directly in Triton for high-performance inference “XGBoost is All You Need” – Bojan Tunguz, 4x Kaggle Grandmaster All RAPIDS changes are integrated upstream and provided to all XGBoost users – via pypi or RAPIDS conda
  10. 12 RAPIDS in the Cloud Current Focus Areas • NVIDIA

    DGX Cloud • Kubernetes • Helm Charts • Operator • Kubeflow • Cloud ML Platforms • Amazon Sagemaker Studio • Google Vertex AI • Cloud Compute • Amazon EC2, ECS, Fargate, EKS • Google Compute Engine, Dataproc, GKE • Cloud ML examples gallery New Deployment documentation website Deployment Documentation: docs.rapids.ai/deployment/stable Kubernetes Deployment: docs.rapids.ai/deployment/stable/platforms/kubernetes.html Dask Kubernetes: kubernetes.dask.org
  11. 14 Live Demo Murphy's First Law: Anything that can go

    wrong will go wrong. Murphy's Second Law: Nothing is as easy as it looks. Murphy's Third Law: Everything takes longer than you think it will.
  12. 15 Launch a Kubernetes Cluster # Launch a Kubernetes Cluster

    with GPUs $ gcloud container clusters create jtomlinson-rapids-demo \ --accelerator type=nvidia-tesla-a100,count=2 \ --machine-type a2-highgpu-2g \ --zone us-central1-c
  13. 16 Install NVIDIA Drivers # Install the NVIDIA Drivers $

    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/contain er-engine-accelerators/master/nvidia-driver-installer/cos/dae monset-preloaded-latest.yaml
  14. 17 Install the Dask operator # Install the Dask Operator

    $ helm install --repo https://helm.dask.org \ --create-namespace -n dask-operator \ --generate-name dask-kubernetes-operator
  15. 18 Installing the operator # Check that we can list

    daskcluster resources $ kubectl get daskclusters No resources found in default namespace. # Check that the operator pod is running $ kubectl get pods -A -l application=dask-kubernetes-operator NAMESPACE NAME READY STATUS RESTARTS AGE dask-operator dask-kubernetes-operator-775b8bbbd5-zdrf7 1/1 Running 0 74s # 🚀 done!
  16. 19 # cluster.yaml apiVersion: kubernetes.dask.org/v1 kind: DaskCluster metadata: name: simple-cluster

    spec: worker: replicas: 3 spec: containers: - name: worker image: "ghcr.io/dask/dask:latest" imagePullPolicy: "IfNotPresent" args: - dask-worker - --name - $(DASK_WORKER_NAME) scheduler: spec: containers: - name: scheduler image: "ghcr.io/dask/dask:latest" imagePullPolicy: "IfNotPresent" args: - dask-scheduler ports: - name: tcp-comm containerPort: 8786 protocol: TCP - name: http-dashboard containerPort: 8787 protocol: TCP readinessProbe: httpGet: port: http-dashboard path: /health initialDelaySeconds: 5 … The Dask Operator has some custom resource types that you can create via kubectl. e.g • DaskCluster to create whole clusters. • DaskWorkerGroup to create additional groups of workers with various configurations (high memory, GPUs, etc). • DaskJob to run end-to-end tasks like a Kubernetes Job but with an adjacent Dask Cluster. Create RAPIDS Clusters with kubectl Tip: Use dask kubernetes gen cluster to generate this YAML for you
  17. 20 Create RAPIDS Clusters within Notebooks With on prem or

    cloud-managed Kubernetes # Install dask-kubernetes $ pip install dask-kubernetes # Launch a cluster >>> from dask_kubernetes.operator \ import KubeCluster >>> cluster = KubeCluster(name="rapids") # List the DaskCluster custom resource that was created for us under the hood $ kubectl get daskclusters NAME AGE rapids 6m3s
  18. 21 Get a Jupyter notebook on your Dask cluster #

    Create a cluster with Jupyter running alongside the scheduler $ dask kubernetes gen cluster \ --name rapids \ --image rapidsai/notebooks:23.08-cuda12.0-py3.10 \ --worker-command dask-cuda-worker \ --resources='{"limits": {"nvidia.com/gpu": "1"}}' \ --jupyter \ | kubectl apply -f -
  19. 25 GCP T4 Instance Parallel HPO Computational Parallelism Beyond a

    Single Node X, y = … # NumPy Arrays # Optimize in parallel on your Dask cluster with parallel_backend("dask"): study.optimize(lambda trial: objective(trial, X, y), n_trials=100, n_jobs=4) # NGPUs on system GPU cuda-worker GPU cuda-worker GPU cuda-worker GPU cuda-worker LocalCUDA cluster GKE Cluster with GPU Pods GPU cuda-worker GPU cuda-worker GPU cuda-worker KubeCluster … … X, y = … # NumPy Arrays # Optimize in parallel on your Dask cluster with parallel_backend("dask"): study.optimize(lambda trial: objective(trial, X, y), n_trials=100, n_jobs=20) # NGPUs on K8s cluster
  20. 26 Example Notebook You can find all the code for

    this parallel HPO example in our deployment docs repo. https://docs.rapids.ai/deployment/stable /examples/xgboost-gpu-hpo-job-parallel-k 8s/notebook/
  21. 27

  22. 30 How to Get Started with RAPIDS A Variety of

    Ways to Get Up & Running More about RAPIDS Self-Start Resources Discussion & Support • Learn more at RAPIDS.ai • Read the API docs • Check out the RAPIDS blog • Read the NVIDIA DevBlog • Get started with RAPIDS • Deploy on the Cloud today • Start with Google Colab • Look at the cheat sheets • Check the RAPIDS GitHub • Use the NVIDIA Forums • Reach out on Slack • Talk to NVIDIA Services @RAPIDSai https://github.com/rapidsai https://rapids-goai.slack.com/join https://rapids.ai Get Engaged