Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deploying multi-GPU workloads on Kubernetes in ...

Deploying multi-GPU workloads on Kubernetes in Python

The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs with minimal code changes and no new tools to learn.

Dask provides advanced parallelism for Python by breaking functions into a task graph that can be evaluated by a task scheduler that has many workers.

By using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines. In this talk we will discuss how to install and configure Dask on your Kubernetes cluster and use it to run accelerated GPU workloads on your cluster.

Jacob Tomlinson

February 02, 2023
Tweet

More Decks by Jacob Tomlinson

Other Decks in Technology

Transcript

  1. Deploying Multi-GPU workloads on Kubernetes in Python PyData DC -

    Feb 2023 Jacob Tomlinson Software Engineering Lead NVIDIA
  2. 4 Minor Code Changes for Major Benefits Abstracting Accelerated Compute

    through Familiar Interfaces In [1]: import pandas as pd In [2]: df = pd.read_csv(‘filepath’) In [1]: from sklearn.ensemble import RandomForestClassifier In [2]: clf = RandomForestClassifier(n_estimators=10 0,max_depth=8, random_state=0) In [3]: clf.fit(x, y) In [1]: import networkx as nx In [2]: page_rank=nx.pagerank(graph) In [1]: import cudf In [2]: df = cudf.read_csv(‘filepath’) In [1]: from cuml.ensemble import RandomForestClassifier In [2]: cuclf = RandomForestClassifier(n_estimators=10 0,max_depth=8, random_state=0) In [3]: cuclf.fit(x, y) In [1]: import cugraph In [2]: page_rank=cugraph.pagerank(graph) GPU CPU pandas scikit-learn NetworkX cuDF cuML cuGraph Average Speed-Ups: 150x Average Speed-Ups: 250x Average Speed-Ups: 50x
  3. 5 Lightning-Fast End-to-End Performance Reducing Data Science Processes from Hours

    to Seconds *CPU approximate to n1-highmem-8 (8 vCPUs, 52GB memory) on Google Cloud Platform. TCO calculations-based on Cloud instance costs. A100s Provide More Power than 100 CPU Nodes 16 More Cost-Effective than Similar CPU Configuration 20x Faster Performance than Similar CPU Configuration 70x
  4. General purpose Python library for parallelism Scales existing libraries, like

    Numpy, Pandas, and Scikit-Learn Flexible enough to build complex and custom systems Accessible for beginners, secure and trusted for institutions Jacob Tomlinson Core Developer Dask
  5. Dask accelerates the existing Python ecosystem Built alongside with the

    current community import numpy as np x = np.ones((1000, 1000)) x + x.T - x.mean(axis=0 import pandas as pd df = pd.read_csv(“file.csv”) df.groupby(“x”).y.mean() from scikit_learn.linear_model \ import LogisticRegression lr = LogisticRegression() lr.fit(data, labels) Numpy Pandas Scikit-Learn
  6. 8 Pre-Processing pandas Data Preparation Visualization Model Training Machine Learning

    scikit-learn Graph Analytics NetworkX Deep Learning TensorFlow, PyTorch, MxNet Visualization matplotlib Apache Spark / Dask CPU Memory Open Source Software Has Democratized Data Science Highly Accessible, Easy to Use Tools Abstract Complexity
  7. 9 Accelerated Data Science with RAPIDS Powering Popular Data Science

    Ecosystems with NVIDIA GPUs Pre-Processing cuIO & cuDF Data Preparation Visualization Model Training Machine Learning cuML, XGBoost Graph Analytics cuGraph Deep Learning TensorFlow, PyTorch, MxNet Visualization cuXfilter, pyViz, Plotly Dask GPU Memory Spark / Dask
  8. 10 XGBoost + RAPIDS: Better Together • RAPIDS comes paired

    with XGBoost 1.6.0 • XGBoost provides zero-copy data import from cuDF, CuPy, Numba, PyTorch and more • Official Dask API makes it easy to scale to multiple nodes or multiple GPUs • GPU tree builder delivers huge perf gains • Now supports Learning to Rank, categorical variables, and SHAP Explainability • Use models directly in Triton for high-performance inference “XGBoost is All You Need” – Bojan Tunguz, 4x Kaggle Grandmaster All RAPIDS changes are integrated upstream and provided to all XGBoost users – via pypi or RAPIDS conda
  9. 12 RAPIDS in the Cloud Current Focus Areas • Kubernetes

    • Helm Charts • Operator • Kubeflow • Cloud ML Platforms • Amazon Sagemaker Studio • Google Vertex AI • Cloud Compute • Amazon EC2, ECS, Fargate, EKS • Google Compute Engine, Dataproc, GKE • Cloud ML examples gallery New Deployment documentation website Deployment Documentation: docs.rapids.ai/deployment/stable Kubernetes Deployment: docs.rapids.ai/deployment/stable/platforms/kubernetes.html Dask Kubernetes: kubernetes.dask.org
  10. 14 Live Demo Murphy's First Law: Anything that can go

    wrong will go wrong. Murphy's Second Law: Nothing is as easy as it looks. Murphy's Third Law: Everything takes longer than you think it will.
  11. 15 Launch a Kubernetes Cluster # Launch a Kubernetes Cluster

    with GPUs $ gcloud container clusters create jtomlinson-rapids-demo \ --accelerator type=nvidia-tesla-a100,count=2 \ --machine-type a2-highgpu-2g \ --zone us-central1-c
  12. 16 Install NVIDIA Drivers # Install the NVIDIA Drivers $

    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/contain er-engine-accelerators/master/nvidia-driver-installer/cos/dae monset-preloaded-latest.yaml
  13. 17 Install the Dask operator # Install the Dask Operator

    $ helm install --repo https://helm.dask.org \ --create-namespace -n dask-operator \ --generate-name dask-kubernetes-operator
  14. 18 Installing the operator # Check that we can list

    daskcluster resources $ kubectl get daskclusters No resources found in default namespace. # Check that the operator pod is running $ kubectl get pods -A -l application=dask-kubernetes-operator NAMESPACE NAME READY STATUS RESTARTS AGE dask-operator dask-kubernetes-operator-775b8bbbd5-zdrf7 1/1 Running 0 74s # 🚀 done!
  15. 19 Get a Jupyter notebook # Create a notebook Pod

    for us to drive the workload from $ kubectl apply -f notebook.yaml Source for notebook.yaml https://gist.github.com/jacobtomlinson/397b277e6cc4b717d9ff04759f350b4a#file-notebook-yaml
  16. 20 Create RAPIDS Clusters within Notebooks With on prem or

    cloud-managed Kubernetes # Install dask-kubernetes $ pip install dask-kubernetes # Launch a cluster >>> from dask_kubernetes.operator \ import KubeCluster >>> cluster = KubeCluster(name="demo") # List the DaskCluster custom resource that was created for us under the hood $ kubectl get daskclusters NAME AGE demo-cluster 6m3s
  17. 21 # cluster.yaml apiVersion: kubernetes.dask.org/v1 kind: DaskCluster metadata: name: simple-cluster

    spec: worker: replicas: 3 spec: containers: - name: worker image: "ghcr.io/dask/dask:latest" imagePullPolicy: "IfNotPresent" args: - dask-worker - --name - $(DASK_WORKER_NAME) scheduler: spec: containers: - name: scheduler image: "ghcr.io/dask/dask:latest" imagePullPolicy: "IfNotPresent" args: - dask-scheduler ports: - name: tcp-comm containerPort: 8786 protocol: TCP - name: http-dashboard containerPort: 8787 protocol: TCP readinessProbe: httpGet: port: http-dashboard path: /health initialDelaySeconds: 5 … The Dask Operator has three custom resource types that you can create via kubectl. • DaskCluster to create whole clusters. • DaskWorkerGroup to create additional groups of workers with various configurations (high memory, GPUs, etc). • DaskJob to run end-to-end tasks like a Kubernetes Job but with an adjacent Dask Cluster. Create RAPIDS Clusters with kubectl
  18. 25 GCP T4 Instance Parallel HPO Computational Parallelism Beyond a

    Single Node X, y = … # NumPy Arrays # Optimize in parallel on your Dask cluster with parallel_backend("dask"): study.optimize(lambda trial: objective(trial, X, y), n_trials=100, n_jobs=4) # NGPUs on system GPU cuda-worker GPU cuda-worker GPU cuda-worker GPU cuda-worker LocalCUDA cluster GKE Cluster with GPU Pods GPU cuda-worker GPU cuda-worker GPU cuda-worker KubeCluster … … X, y = … # NumPy Arrays # Optimize in parallel on your Dask cluster with parallel_backend("dask"): study.optimize(lambda trial: objective(trial, X, y), n_trials=100, n_jobs=20) # NGPUs on K8s cluster
  19. 27

  20. 30 How to Get Started with RAPIDS A Variety of

    Ways to Get Up & Running More about RAPIDS Self-Start Resources Discussion & Support • Learn more at RAPIDS.ai • Read the API docs • Check out the RAPIDS blog • Read the NVIDIA DevBlog • Get started with RAPIDS • Deploy on the Cloud today • Start with Google Colab • Look at the cheat sheets • Check the RAPIDS GitHub • Use the NVIDIA Forums • Reach out on Slack • Talk to NVIDIA Services @RAPIDSai https://github.com/rapidsai https://rapids-goai.slack.com/join https://rapids.ai Get Engaged