Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Offline Batch Inference: Comparing Ray, Apache ...

Offline Batch Inference: Comparing Ray, Apache Spark, SageMaker

As more companies use large scale machine learning (ML) models for training and evaluation, offline batch inference becomes an essential workload. A number of challenges come with it: managing compute infrastructure; optimizing use of all heterogeneous resources; and transferring data from storage to hardware accelerators. Addressing these challenges, Ray performs significantly better as it can coordinate clusters of diverse resources, allowing for better utilization of the specific resource requirements of the workload.

In this talk we will talk about:
* What are the challenges and limitations
* Examine three different solutions for offline batch inference: AWS SageMaker
* Batch Transform, Apache Spark, and Ray Data.
* Share our performance numbers showing Ray data as the best solution for offline batch inference at scale

Anyscale

June 22, 2023
Tweet

More Decks by Anyscale

Other Decks in Technology

Transcript

  1. Talk Overview - What is Batch inference and why does

    it matter? - Challenges with Batch inference
  2. Talk Overview - What is Batch inference and why does

    it matter? - Challenges with Batch inference - Exploring current solution space
  3. Talk Overview - What is Batch inference and why does

    it matter? - Challenges with Batch inference - Exploring current solution space - Comparing three solutions
  4. Talk Overview - What is Batch inference and why does

    it matter? - Challenges with Batch inference - Exploring current solution space - Comparing three solutions - Sagemaker Batch Transform - Apache Spark - Ray Datasets
  5. Challenges 1. Managing heterogeneous compute infrastructure 2. Utilizing all resources

    in the cluster 3. Efficient data transfer storage -> CPU RAM -> GPU RAM
  6. Challenges 1. Managing heterogeneous compute infrastructure 2. Utilizing all resources

    in the cluster 3. Efficient data transfer storage -> CPU RAM -> GPU RAM 4. Developer experience
  7. Approach 1: Batch Services AWS Batch, GCP Batch, Azure Batch

    Partially handle Challenge #1 (managed infra) No heterogeneous clusters Don’t handle Challenges #2, #3, #4
  8. Approach 1: Batch Services AWS Batch, GCP Batch, Azure Batch

    Partially handle Challenge #1 (managed infra) No heterogeneous clusters Don’t handle Challenges #2, #3, #4 What about Modal Labs?
  9. Approach 2: Online Inference Solutions Bento ML, Ray Serve, Sagemaker

    Batch Transform • Abstracts away infra complexities • Abstractions for model packaging • Framework integrations
  10. Approach 2: Online Inference Solutions Bento ML, Ray Serve, Sagemaker

    Batch Transform • Abstracts away infra complexities • Abstractions for model packaging • Framework integrations Unnecessary complexities for offline inference Starting HTTP Server, sending requests over network… Hard to saturate GPUs BentoML integrates with Spark for offline inference
  11. Approach 3: Distributed Data Systems Spark, Ray Data Designed to

    handle Map operations on large datasets Native support for • Scaling across clusters • Data partitioning and batching • I/O layer, connect to data sources
  12. Benchmark Pretrained ResNet 50 model on ImageNet data 1. Reading

    images from S3 2. Simple CPU preprocessing (resizing, cropping, normalization) 3. Model inference on GPU 10 GB, 300 GB
  13. SageMaker Batch Transform Addresses Challenge #1 partially- abstracts away infra

    management But, Sagemaker Batch Transform uses architecture for online serving • Starts HTTP Server, deploys model as endpoint • Each image sent as a request to the server • Cannot batch across multiple files • Max payload size is 100 MB -> cannot saturate GPUs! Poor developer UX, difficult debugging
  14. Comparing Ray Data and Spark Challenge #2: Utilizing all resources

    in the cluster for CPU+GPU workloads Challenge #3: Efficient data transfer for multi-dimensional tensors - Numpy+Pyarrow, no Pandas overhead - No JVM<>Pyarrow overhead
  15. Comparing Ray Data and Spark Challenge #2: Utilizing all resources

    in the cluster for CPU+GPU workloads Challenge #3: Efficient data transfer for multi-dimensional tensors - Numpy+Pyarrow, no Pandas overhead - No JVM<>Pyarrow overhead Challenge #4: Developer Experience - Ray is Python first - Easier debugging, better stack traces
  16. Does Ray Data Scale to 10 TB? 40 GPU cluster

    Throughput: 11,580.958 img/sec 90%+ GPU utilization
  17. Summary Ray Data outperforms Sagemaker and Spark for offline batch

    inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters.
  18. Summary Ray Data outperforms Sagemaker and Spark for offline batch

    inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters. 2. Streams data from cloud storage -> CPU -> GPU, utilizing all cluster resources.
  19. Summary Ray Data outperforms SageMaker and Spark for offline batch

    inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters. 2. Streams data from cloud storage -> CPU -> GPU, utilizing all cluster resources. 3. Support for multi-dimensional tensors with zero-copy exchange
  20. Summary Ray Data outperforms Sagemaker and Spark for offline batch

    inference Ray Data meets all 4 challenges 1. Abstracts away compute infrastructure management, supports heterogeneous clusters. 2. Streams data from cloud storage -> CPU -> GPU, utilizing all cluster resources. 3. Support for multi-dimensional tensors with zero-copy exchange 4. Python native, making it easy to develop and debug.