How to change GPU assignment strategy, from greedy to balanced?

mk6 · March 7, 2025, 4:39pm

1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

Ray version: 2.40.0
Python version: 3.10.14
OS: Ubuntu 22.04.3 LTS
Cloud/Infrastructure:
Other libs/tools (if relevant):

3. What happened vs. what you expected:
I have a multi-step data processing pipeline written with ray data. There are 2 steps (let’s call them A and B) involving GPUs. In these 2, the first one uses a small model thus requires less GPU RAM and finishes earlier. Both these steps are processed with map_batches and I specify number of GPUs to use.
Let’s say I have 8 GPUs in a machine. Ideally I want to keep all of them running throughout the entire processing process: assign step A for 0.1 GPU with a concurrency of 8, assign step B for 0.9 GPU with a concurrency of 8. This way all of the 8 GPUs are always running.

Expected: The 8 copies of the model of both steps are assigned to one GPU each. In other words, each GPU is shared by one copy of step A model and a copy of step B model (0.1+0.9=1)
Actual: The 8 copies of step A model are all assigned to GPU 0.

I think it’s possible to achieve the same effect by merging the 2 steps into a single step (actor), then assign it a whole GPU.
But is it possible to use a balanced assignment instead of greedily assigning models/actors to each GPU one by one?

rliaw · March 7, 2025, 8:33pm

Hmm, I think the main way to achieve what you’re doing would be to merge everything into 1 actor.

Why do you want to do balanced assignment?

mk6 · March 8, 2025, 3:52am

alright

In my use case, step A runs significantly faster than step B, so it finishes way earlier. With my current implementation in which one step is an actor class, when assigning 1 GPU for step A and the other 7 for step B, the GPU 0 completely idles after step A finishes. But an instance of step A actor only requires 10% GPU RAM, so with balanced assignment, for each GPU, I can assign 10% RAM for step A and 90% for step B. In this way, all GPUs will work all the time.

A potential issue of merging them into one actor class is, there’s a intermediate CPU-only and CPU-intensive processing step between A and B. I guess it may be hard to parallelize it in a merged actor class.
Another potential benefit of having a balanced strategy is it facilitates a modular implementation of processing steps, enabling easier debugging and flexibility for composition by needs.

rliaw · March 10, 2025, 3:33am

Yeah makes sense. Other thing you could try is just allocating 1 Gpu to the smaller model and increase the batch size, and set 7 GPUs to the later stage. Technically the throughputs should balance out, but depends on your application.

mk6 · March 10, 2025, 5:18am

That’s what I mentioned here. The defect is the speed of doing step B is approximately 7/8, since my step B is really, really slow.

rliaw · March 10, 2025, 8:17pm

Is it not a streaming operation? Do you want to post some code?

mk6 · March 11, 2025, 3:17am

It does run in a streaming way.

Let me try again to clarify.
As I mentioned previously, step A uses a tiny and fast model, so it finishes much earlier than step B, which uses a giant and slow model. For instance, step A finishes processing all the data in 1 hour but step B needs an extra 9 hours to finish.

In this case, if i take this strategy

GPU #0 will be running only for the first 1 hour (doing step A) and idle for the rest 9 hours. So effectively the workload of step B is handled by 7 GPUs whereas there is 1 extra GPU available but not used for the 9 hours.
This is what I meant by “the speed of doing step B is approximately 7/8”.

Anyway, my current mitigation is to

assign step A num_gpus=0.01 and concurrency=1
assign step B num_gpus=0.95 and concurrency=8

This way, all 8 GPUs share the workload of the compute-intensive step B. it’s just not elegant and robust I presume

Topic		Replies	Views
Can we make ray evenly schedule tasks on different GPUs? Ray Core	3	316	January 11, 2021
Jobs with fractional GPU usage are not spread across GPUs evenly Ray Core	12	1374	October 31, 2022
How to assign a specific actor to a specific GPU Ray Core	15	1556	February 16, 2021
How to allocate specific gpu to trials when use ray tune Ray Tune	4	546	July 25, 2021
How to distribute actors to multiple GPUs Ray Core	6	1158	May 5, 2022

How to change GPU assignment strategy, from greedy to balanced?

Related topics