Can Ray Dynamically Schedule GPU Tasks Based on Known GPU Performance Profiles?

I’m exploring whether Ray is a good fit for the following large-scale GPU inference setup, and would appreciate guidance or validation.

High-Level Requirements:

  • I have billions of small files stored in S3.
  • I want to run inference using a model that fits on any single GPU.
  • Assume the Ray cluster runs on Kubernetes with heterogeneous GPUs (e.g., A100, T4, V100), including spot instances — so pods may disappear anytime.
    I want to maximize GPU utilization by:
  • Dynamically assigning file batches to any available GPU.
  • Using larger batches for faster GPUs, and smaller ones for slower GPUs.
  • Automatically feeding more data as soon as a GPU finishes a batch.
  • Ensuring fault tolerance in case a pod dies midway (no duplicates or lost batches).

My Question:
Let’s say I’ve done benchmarking in advance and know approximately how many files or batches each GPU type (e.g., A100 vs T4) can handle efficiently. Given this:

Can Ray:

  1. Dynamically push work to whichever pod/GPU becomes available — without pre-assigning static partitions?
  2. Use the available resource metadata (e.g., GPU type or speed) to adjust batch sizes or workload dynamically?
  3. Handle fault-tolerant task re-execution if a pod (e.g., spot instance) is interrupted mid-processing?
  4. Integrate with an orchestrator (like Airflow or Argo) to manage this whole setup in a multi-stage pipeline?

Thanks in advance — looking to validate this before investing further into implementation!

Yes, Ray can dynamically assign work to available GPUs without static partitioning, using its task and actor scheduling system. You’ll likely be doing this with Ray Core and Ray Data. Can read more about it here:

Ray can utilize resource metadata to adjust batch sizes or workloads dynamically. You can implement custom logic to determine batch sizes based on the type of GPU, there’s a few different resource management functions in Ray built in :slight_smile:

Generally, if a pod/node is interrupted, Ray will automatically reschedule failed tasks as long as your workload is idempotent and you use Ray Data’s checkpointing (you can also build your own if you like). Can read more about it here: Tasks — Ray 2.47.1

We do have integrations available! You can use the ones you mentioned to define and manage complex workflows, while Ray handles the distributed execution of tasks.

Here’s a list of our integrations!