What is RAPIDS?

Jacob Tomlinson

July 14, 2021

140

What is RAPIDS?

Presented at the Cyber Colombia HPC Summer School.

An overview of RAPIDS including cuDF, cuML, CuPy and Dask.

Jacob Tomlinson

July 14, 2021

Tweet

More Decks by Jacob Tomlinson

See All by Jacob Tomlinson

EffVer - Version your code by the effort required to upgrade

0

10

Tech Exeter - Intro to Kubernetes 10 Year Update

0

29

Who Builds the PyData Ecosystem?

0

45

The Art of Wrangling Your GPU Python Environments

0

53

Getting science done with accelerated Python computing platforms

0

52

Dask on HPC in 2024 - Lightning Talk

0

63

GPU Acceleration in the PyData community

0

62

Dask on HPC in 2024

0

39

GPU Acceleration in the PyData community

0

41

Other Decks in Technology

See All in Technology

複数サービスを支えるマルチテナント型Batch MLプラットフォーム

PRO

1

960

AI時代を生き抜くエンジニアキャリアの築き方 (AI-Native 時代、エンジニアという道は「最大の挑戦の場」となる) / Building an Engineering Career to Thrive in the Age of AI (In the AI-Native Era, the Path of Engineering Becomes the Ultimate Arena of Challenge)

0

250

新規プロダクトでプロトタイプから正式リリースまでNext.jsで開発したリアル

1

220

組織を巻き込む大規模プラットフォーム移行戦略〜50+サービスのマルチリージョン・マルチプロダクト化で学んだステークホルダー協働の実践〜 / Platform migration strategy engaging all stakeholders

2

140

20250905_MeetUp_Ito-san_s_presentation.pdf

1

100

oracle4engineer

PRO

0

160

「どこから読む？」コードとカルチャーに最速で馴染むための実践ガイド

PRO

0

570

サラリーマンの小遣いで作るtoCサービス - Cloudflare Workersでスケールする開発戦略

2

470

AWSを利用する上で知っておきたい名前解決のはなし（10分版）

10

3.2k

2025/09/16 仕様駆動開発とAI-DLCが導くAI駆動開発の新フェーズ

masahiro_okamura

0

130

Aurora DSQLはサーバーレスアーキテクチャの常識を変えるのか

1

1.2k

Android Audio: Beyond Winning On It

0

3.4k

Featured

See All Featured

Gamification - CAS2011

81

5.4k

ReactJS: Keep Simple. Everything can be a component!

667

120k

Why Our Code Smells

PRO

339

57k

Typedesign – Prime Four

42

2.8k

Why You Should Never Use an ORM

PRO

59

9.5k

Learning to Love Humans: Emotional Interface Design

273

40k

Balancing Empowerment & Direction

3

620

4 Signs Your Business is Dying

184

22k

Thoughts on Productivity

70

4.8k

VelocityConf: Rendering Performance Case Studies

332

24k

PRO

29

5.5k

Mobile First: as difficult as doing things right

224

9.9k

Transcript

Jacob Tomlinson Senior Software Engineer, RAPIDS Engineering Open GPU Data
Science
2 Jacob Tomlinson
3 What is RAPIDS?
4 RAPIDS https://github.com/rapidsai
5 25-100x Improvement Less Code Language Flexible Primarily In-Memory HDFS
Read HDFS Write HDFS Read HDFS Write HDFS Read Query ETL ML Train HDFS Read Query ETL ML Train HDFS Read GPU Read Query CPU Write GPU Read ETL CPU Write GPU Read ML Train 5-10x Improvement More Code Language Rigid Substantially on GPU Traditional GPU Processing Hadoop Processing, Reading from Disk Spark In-Memory Processing Data Processing Evolution Faster Data Access, Less Data Movement RAPIDS Arrow Read ETL ML Train Query 50-100x Improvement Same Code Language Flexible Primarily on GPU
6 Jake VanderPlas - PyCon 2017
7 Pandas Analytics CPU Memory Data Preparation Visualization Model Training
Scikit-Learn Machine Learning NetworkX Graph Analytics PyTorch, TensorFlow, MxNet Deep Learning Matplotlib Visualization Dask Open Source Data Science Ecosystem Familiar Python APIs
8 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model
Training cuML Machine Learning cuGraph Graph Analytics PyTorch, TensorFlow, MxNet Deep Learning cuxfilter, pyViz, plotly Visualization Dask RAPIDS End-to-End Accelerated GPU Data Science
9 OPEN SOURCE CONTRIBUTORS ADOPTERS Ecosystem Partners
10 Time in seconds (shorter is better) cuIO/cuDF (Load and
Data Prep) Data Conversion XGBoost Faster Speeds, Real World Benefits Faster Data Access, Less Data Movement cuIO/cuDF – Load and Data Preparation XGBoost Machine Learning End-to-End Benchmark 200GB CSV dataset; Data prep includes joins, variable transformations CPU Cluster Configuration CPU nodes (61 GiB memory, 8 vCPUs, 64-bit platform), Apache Spark RAPIDS Version RAPIDS 0.17 A100 Cluster Configuration 16 A100 GPUs (40GB each)
11 Technologies
12 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model
Training cuML Machine Learning cuGraph Graph Analytics PyTorch, TensorFlow, MxNet Deep Learning cuxfilter, pyViz, plotly Visualization Dask RAPIDS End-to-End Accelerated GPU Data Science
13 cuDF
14 ETL - the Backbone of Data Science PYTHON LIBRARY
▸ A Python library for manipulating GPU DataFrames following the Pandas API ▸ Python interface to CUDA C++ library with additional functionality ▸ Creating GPU DataFrames from Numpy arrays, Pandas DataFrames, and PyArrow Tables ▸ JIT compilation of User-Defined Functions (UDFs) using Numba cuDF is…
15 Benchmarks: Single-GPU Speedup vs. Pandas cuDF v0.13, Pandas 0.25.3
▸ Running on NVIDIA DGX-1: ▸ GPU: NVIDIA Tesla V100 32GB ▸ CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz ▸ Benchmark Setup: ▸ RMM Pool Allocator Enabled ▸ DataFrames: 2x int32 columns key columns, 3x int32 value columns ▸ Merge: inner; GroupBy: count, sum, min, max calculated for each value column 300 900 500 0 Merge Sort GroupBy GPU Speedup Over CPU 10M 100M 970 500 370 350 330 320
16 Extraction is the Cornerstone cuIO for Faster Data Loading
▸ Follow Pandas APIs and provide >10x speedup ▸ Multiple supported formats, including: ▸ CSV Reader, CSV Writer ▸ Parquet Reader, Parquet Writer ▸ ORC Reader, ORC Writer ▸ JSON Reader ▸ Avro Reader ▸ GPU Direct Storage integration in progress for bypassing PCIe bottlenecks! ▸ Key is GPU-accelerating both parsing and decompression ▸ Benchmark: ▸ Dataset: NY Taxi dataset (Jan 2015) ▸ GPU: Single 32GB V100 ▸ RAPIDS Version: 0.17 N/A
17 CuPy
18
19 More details: https://blog.dask.org/2019/06/27/single-gpu-cupy-benchmarks Benchmark: Single-GPU CuPy vs NumPy 800
400 0 Elementwise GPU Speedup Over CPU Operation 800MB 8MB 150 270 5.3 210 3.6 190 5.1 150 8.3 66 18 11 1.5 17 1.1 3.5 FFT Array Slicing Stencil Sum Matrix Multiplication SVD Standard Deviation 100
20 SVD Benchmark Dask and CuPy Doing Complex Workflows
21 cuML
22 Decision Trees / Random Forests Linear/Lasso/Ridge/LARS/ElasticNet Regression Logistic Regression
K-Nearest Neighbors (exact or approximate) Support Vector Machine Classification and Regression Naive Bayes K-Means DBSCAN Spectral Clustering Principal Components (including iPCA) Singular Value Decomposition UMAP Spectral Embedding T-SNE Holt-Winters Seasonal ARIMA / Auto ARIMA More to come! Random Forest / GBDT Inference (FIL) Time Series Clustering Decomposition & Dimensionality Reduction Preprocessing Inference Classification / Regression Hyper-parameter Tuning Cross Validation Algorithms GPU-accelerated Scikit-Learn Text vectorization (TF-IDF / Count) Target Encoding Cross-validation / splitting
23 RAPIDS Matches Common Python APIs CPU-based Clustering from sklearn.datasets
import make_moons import pandas X, y = make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X = pandas.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) from sklearn.cluster import DBSCAN dbscan = DBSCAN(eps = 0.3, min_samples = 5) y_hat = dbscan.fit_predict(X)
24 from sklearn.datasets import make_moons import cudf X, y =
make_moons(n_samples=int(1e2), noise=0.05, random_state=0) X = cudf.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])}) from cuml import DBSCAN dbscan = DBSCAN(eps = 0.3, min_samples = 5) y_hat = dbscan.fit_predict(X) RAPIDS Matches Common Python APIs GPU-accelerated Clustering
25 Benchmarks: Single-GPU cuML vs Scikit-learn 1x V100 vs. 2x
20 Core CPUs (DGX-1, RAPIDS 0.15)
26 Dask
27 cuDF cuIO Analytics GPU Memory Data Preparation Visualization Model
Training cuML Machine Learning cuGraph Graph Analytics PyTorch Chainer MxNet Deep Learning cuXfilter <> pyViz Visualization RAPIDS Scaling RAPIDS with Dask Dask
28 Why Dask? EASY SCALABILITY ▸ Easy to install and
use on a laptop ▸ Scales out to thousand node clusters ▸ Modularly built for acceleration DEPLOYABLE ▸ HPC: SLURM, PBS, LSF, SGE ▸ Cloud: Kubernetes ▸ Hadoop/Spark: Yarn PYDATA NATIVE ▸ Easy Migration: Built on top of NumPy, Pandas Scikit-Learn, etc ▸ Easy Training: With the same API POPULAR ▸ Most Common parallelism framework today in the PyData and SciPy community ▸ Millions of monthly Downloads and Dozens of Integrations NumPy, Pandas, Scikit-Learn, Numba and many more Single CPU core In-memory data PYDATA Multi-core and distributed PyData NumPy -> Dask Array Pandas -> Dask DataFrame Scikit-Learn -> Dask-ML … -> Dask Futures DASK Scale Out / Parallelize
29 Why Dask? Dask scales arrays, dataframes and ML APIs
30 Accelerated on single GPU NumPy -> CuPy/PyTorch/.. Pandas ->
cuDF Scikit-Learn -> cuML NetworkX -> cuGraph Numba -> Numba RAPIDS AND OTHERS NumPy, Pandas, Scikit-Learn, NetworkX, Numba and many more Single CPU core In-memory data PYDATA Scale Up / Accelerate Scale Up with RAPIDS
31 Accelerated on single GPU NumPy -> CuPy/PyTorch/.. Pandas ->
cuDF Scikit-Learn -> cuML NetworkX -> cuGraph Numba -> Numba RAPIDS AND OTHERS Multi-GPU On single Node (DGX) Or across a cluster RAPIDS + DASK WITH OPENUCX NumPy, Pandas, Scikit-Learn, Numba and many more Single CPU core In-memory data PYDATA Multi-core and distributed PyData NumPy -> Dask Array Pandas -> Dask DataFrame Scikit-Learn -> Dask-ML … -> Dask Futures DASK Scale Up / Accelerate Scale Out / Parallelize Scale Out with RAPIDS + Dask with OpenUCX
32 and so much more...
33 Even more RAPIDS libraries and ecosystem packages cuGraph ▸
Graph analytics ▸ Compatible with NetworkX, SciPy and CuPy cuSpatial ▸ Spatial Analytics ▸ Point-in-polygon and distance calculations cuSignal ▸ Signal processing NVTabular ▸ ETL library for recommender systems A Bigger, Better, Stronger Ecosystem for All CLX/cyBERT ▸ Cyber log acceleration ▸ Utilizes NLP and transformer architectures for cybersecurity tasks Data vizualization ▸ Cuxfilter and Plotly Dash ▸ Part of the pyViz community BlazingSQL ▸ GPU accelerated SQL engine built on top of RAPIDS Streamz ▸ Distributed stream processing
34 Interoperability for the Win mpi4py ▸ Real-world workflows often
need to share data between libraries ▸ RAPIDS supports device memory sharing between many popular data science and deep learning libraries ▸ Keeps data on the GPU--avoids costly copying back and forth to host memory ▸ Any library that supports DLPack or __cuda_array_interface__ will allow for sharing of memory buffers between RAPIDS and supported libraries
35 Exactly as it sounds—our goal is to make RAPIDS
as usable and performant as possible wherever data science is done. We will continue to work with more open source projects to further democratize acceleration and efficiency in data science. RAPIDS Everywhere The Next Phase of RAPIDS
36 Getting started
37 RAPIDS Docs https://docs.rapids.ai
38 Easy Installation Interactive Installation Guide
39 Integration with major cloud providers | Both containers and
cloud specific machine instances Support for Enterprise and HPC Orchestration Layers Cloud Dataproc Azure Machine Learning Deploy RAPIDS Everywhere Focused on Robust Functionality, Deployment, and User Experience
40 Integrations, feedback, documentation support, pull requests, new issues, or
code donations welcomed! APACHE ARROW GPU OPEN ANALYTICS INITIATIVE https://arrow.apache.org/ @ApacheArrow http://gpuopenanalytics.com/ @GPUOAI RAPIDS https://rapids.ai @RAPIDSai DASK https://dask.org @Dask_dev Join the Movement Everyone Can Help!
THANK YOU Jacob Tomlinson @_jacobtomlinson [email protected]