Some of the most demanding machine learning (ML) use cases we have encountered involve pipelines that span both CPU and GPU devices in distributed environments. These situations are common workloads, including:
* Batch inference, which involves a CPU-intensive preprocessing stage (e.g., video decoding or image resizing) before utilizing a GPU-intensive model to make predictions.
* Distributed training, where similar CPU-heavy transformations are required to prepare or augment the dataset prior to GPU training.
In this talk, we examine how Ray data streaming works and how to use it for your own machine learning pipelines to address these common workloads utilizing all your compute resource–CPUs and GPUs–at scale.