Can Ray Dynamically Schedule GPU Tasks Based on Known GPU Performance Profiles?

maximus · June 16, 2025, 3:27am

I’m exploring whether Ray is a good fit for the following large-scale GPU inference setup, and would appreciate guidance or validation.

High-Level Requirements:

I have billions of small files stored in S3.
I want to run inference using a model that fits on any single GPU.
Assume the Ray cluster runs on Kubernetes with heterogeneous GPUs (e.g., A100, T4, V100), including spot instances — so pods may disappear anytime.
I want to maximize GPU utilization by:
Dynamically assigning file batches to any available GPU.
Using larger batches for faster GPUs, and smaller ones for slower GPUs.
Automatically feeding more data as soon as a GPU finishes a batch.
Ensuring fault tolerance in case a pod dies midway (no duplicates or lost batches).

My Question:
Let’s say I’ve done benchmarking in advance and know approximately how many files or batches each GPU type (e.g., A100 vs T4) can handle efficiently. Given this:

Can Ray:

Dynamically push work to whichever pod/GPU becomes available — without pre-assigning static partitions?
Use the available resource metadata (e.g., GPU type or speed) to adjust batch sizes or workload dynamically?
Handle fault-tolerant task re-execution if a pod (e.g., spot instance) is interrupted mid-processing?
Integrate with an orchestrator (like Airflow or Argo) to manage this whole setup in a multi-stage pipeline?

Thanks in advance — looking to validate this before investing further into implementation!

christina · June 17, 2025, 10:47pm

Yes, Ray can dynamically assign work to available GPUs without static partitioning, using its task and actor scheduling system. You’ll likely be doing this with Ray Core and Ray Data. Can read more about it here:

Ray can utilize resource metadata to adjust batch sizes or workloads dynamically. You can implement custom logic to determine batch sizes based on the type of GPU, there’s a few different resource management functions in Ray built in

Generally, if a pod/node is interrupted, Ray will automatically reschedule failed tasks as long as your workload is idempotent and you use Ray Data’s checkpointing (you can also build your own if you like). Can read more about it here: Tasks — Ray 2.47.1

We do have integrations available! You can use the ones you mentioned to define and manage complex workflows, while Ray handles the distributed execution of tasks.

Here’s a list of our integrations!

Topic		Replies	Views
GPU Memory Aware Scheduling Ray Core	8	913	March 12, 2024
Is it possible to run inference on local GPU as well as rollout CPU workers?	1	259	November 2, 2023
Use Ray to parallelize tasks	3	431	February 22, 2021
Can we make ray evenly schedule tasks on different GPUs? Ray Core	3	307	January 11, 2021
Making Ray scheduler to Pack the workloads Ray Core	0	111	April 5, 2024

Can Ray Dynamically Schedule GPU Tasks Based on Known GPU Performance Profiles?

Related topics