Reducing the Cost of your Data Science Workloads on the Cloud

1 Reducing the Cost of your Data Science Workloads on
the Cloud Jacob Tomlinson Senior Software Engineer RAPIDS GTC 2024

2 Types of Cost Many types of cost can be
reduced with accelerating computing • Infrastructure Cost • The cost of the hardware and infrastructure required to perform business operations. • Measured in dollars. • Human Cost • The time and effort people need to put in to achieve a goal. • Measured in hours and dollars. • Environmental Cost • The environmental costs associated with all of the above. • Measured in grams of CO2. • Computational Cost • How much compute power is required to perform a specific operation. • Measured in watts.

3 RAPIDS https://github.com/rapidsai

4 4 Example: Accelerating pandas with cudf.pandas • Pandas is
the most popular PyData dataframe library • Pandas is great (but slow) • Why is it slow? o Largely single-threaded o Not a query engine! • Many alternatives : o Faster underlying implementation (C++, Rust, CUDA) o Query engines o SQL-inspired o Distributed computing o Hardware accelerated (GPUs) Results of the H2O.ai benchmark maintained by DuckDB: https://duckdblabs.github.io/db-benchmark/

5 5 What is cudf.pandas? • Lets you keep using
pandas o Accelerates it on the GPU with no changes • 100% of the pandas API o Uses the GPU for supported operations o Falls back to using the CPU otherwise • 3rd-party code acceleration o Everything is accelerated. No one changes their code Jupyter/IPython: %load_ext cudf.pandas Command line: python –m cudf.pandas script.py Direct import: import cudf.pandas cudf.pandas.install()

6 150x Faster pandas with Zero Code Change DuckDB Data
Benchmark, 5GB Performance comparison between Traditional pandas v1.5 on Intel Xeon Platinum 8480CL CPU and pandas v1.5 with RAPIDS cuDF on NVIDIA Grace Hopper Source: https://developer.nvidia.com/blog/rapids-cudf-accelerates-pandas-nearly-150x-with-zero-code-changes/

7 Deploying on the Cloud Accelerating with RAPIDS

8 RAPIDS Deployment Models Scales from sharing GPUs to leveraging
many GPUs at once Single Node Multi Node Shared Node Scale up interactive data science sessions with NVIDIA accelerated tools like cudf.pandas Scale out processing and training by leveraging GPU acceleration in distributed frameworks like Dask and Spark Scale out AI/ML APIs and model serving with NVIDIA Triton Inference Server and the Forest Inference Library

9 RAPIDS in the Cloud Current Focus Areas • NVIDIA
DGX™ Cloud • Kubernetes • Helm Charts • Operator • Kubeflow • Cloud AI/ML Platforms • Amazon Sagemaker Studio • Google Vertex AI • Cloud Compute • Amazon EC2, ECS, Fargate, EKS • Google Compute Engine, Dataproc, GKE • AI and Machine Learning examples gallery RAPIDS Deployment documentation website docs.rapids.ai/deployment/stable

10 RAPIDS on Managed Notebook Platforms Serverless Jupyter in the
cloud Example screenshot from Vertex AI documentation https://docs.rapids.ai/deployment/stable/cloud/gcp/vertex-ai/

11 RAPIDS on Compute pipelines Data processing services Example from
AWS EMR documentation https://docs.nvidia.com/spark-rapids/user-guide/latest/getting-started/aws-emr.html

12 RAPIDS on Virtual Machines Servers and workstations in the
cloud Example from Azure Virtual Machine documentation https://docs.rapids.ai/deployment/stable/cloud/azure/azure-vm/

13 GPU Operator Kubernetes GPU GPU GPU GPU GPU GPU
GPU GPU RAPIDS on Kubernetes Unified Cloud Deployments

14 Reducing Cost

15 RAPIDS runs your workloads faster How do you want
to spend those gains? Reduce cost Reduce the amount of time you need to run servers. Beneficial for reducing cloud costs. Do more work Run more workloads for the same time/cost. Process things that were not possible before. Performance boost Get work done faster. May help give a competitive advantage or reduce pressure on SLAs. Environment impact Reduce power needed to perform the same calculation. Using less power produces less CO2. Reduce context switching Reduce time people need to wait for calculations to complete which helps avoid switching to a different task. Improve accuracy Acceleration could allow for more iterations or to process more data leading to improved model accuracy

16 Bill reduction Pay less for the same processing

17 Lightning-Fast End-to-End Performance Reducing Data Science Processes from Hours
to Seconds *CPU approximate to n1-highmem-8 (8 vCPUs, 52GB memory) on Google Cloud Platform. TCO calculations-based on Cloud instance costs. A100s Provide More Power than 100 CPU Nodes 16 More Cost-Effective than Similar CPU Configuration 20x Faster Performance than Similar CPU Configuration 70x

18 Environmental impact Reduce energy use and CO2 emissions

19 Accelerated Analytics Cuts Costs and Carbon “A 2023 benchmark
showed that the RAPIDS Accelerator can reduce a company’s carbon footprint by as much as 80% while delivering 5x average speedups and 4x reductions in computing costs.” RAPIDS Accelerator for Apache Spark https://blogs.nvidia.com/blog/spark-rapids-energy-efficiency/

20 Human efficiency Empower people to do more

21 Sharing resources with multi-tenancy Smoothing out demand peaks with
shared capacity Using Kubernetes we created an autoscaling cluster for interactive Jupyter sessions. Users only use GPUs when they are running computations. The cluster keeps some reserved GPU capacity so that user computations are fulfilled quickly. An overhead of 30% meant that 60% of user computations started within 2 seconds, and 90% within 60 seconds. This can be tuned to suit your needs, more overhead capacity results in reduced wait times. Whatever your preference your cost is always correlated to your compute demand. https://docs.rapids.ai/deployment/stable/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook/

22 Recap Reducing the Cost of your Data Science Workloads
on the Cloud • Accelerated RAPIDS libraries can give 150x speedup with zero code changes • Using NVIDIA accelerated hardware on the cloud can reduce costs • Different businesses prefer to reduce capital, environmental and human costs differently • RAPIDS + GPU Cloud computing allows you to tune benefits to suit your goals

23 Thank you! Learn more at https://rapids.ai

Reducing the Cost of your Data Science Workload...

Reducing the Cost of your Data Science Workloads on the Cloud

Jacob Tomlinson

More Decks by Jacob Tomlinson

Other Decks in Technology

Featured

Transcript

1 Reducing the Cost of your Data Science Workloads on

2 Types of Cost Many types of cost can be

3 RAPIDS https://github.com/rapidsai

4 4 Example: Accelerating pandas with cudf.pandas • Pandas is

5 5 What is cudf.pandas? • Lets you keep using

6 150x Faster pandas with Zero Code Change DuckDB Data

7 Deploying on the Cloud Accelerating with RAPIDS

8 RAPIDS Deployment Models Scales from sharing GPUs to leveraging

9 RAPIDS in the Cloud Current Focus Areas • NVIDIA

10 RAPIDS on Managed Notebook Platforms Serverless Jupyter in the

11 RAPIDS on Compute pipelines Data processing services Example from

12 RAPIDS on Virtual Machines Servers and workstations in the

13 GPU Operator Kubernetes GPU GPU GPU GPU GPU GPU

14 Reducing Cost

15 RAPIDS runs your workloads faster How do you want

16 Bill reduction Pay less for the same processing

17 Lightning-Fast End-to-End Performance Reducing Data Science Processes from Hours

18 Environmental impact Reduce energy use and CO2 emissions

19 Accelerated Analytics Cuts Costs and Carbon “A 2023 benchmark

20 Human efficiency Empower people to do more

21 Sharing resources with multi-tenancy Smoothing out demand peaks with

22 Recap Reducing the Cost of your Data Science Workloads

23 Thank you! Learn more at https://rapids.ai