Jobs with fractional GPU usage are not spread across GPUs evenly

AwesomeLemon · October 5, 2022, 10:42am

Hi,

I noticed that tasks with fractional GPU requirements are scheduled such that they first fully use one GPU before moving on to the next one. This means that some GPUs are 100% loaded while some are staying idle.

For example, if I have 4 GPUs, and run 4 tasks with num_gpus=0.5, two of the GPUs are fully loaded with two tasks each, and two GPUs remain idle. This is not the desired behaviour for me because having 1 task per GPU is faster than having 2 tasks per GPU, and so having idling GPUs makes no sense.

Is there some option I could set to get my desired behaviour of spreading work across GPUs more evenly?

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Alex · October 6, 2022, 5:11pm

There are 2 options

If you think the task should get its own GPU, why not just set num_gpus=1?
You can use the SPREAD scheduling strategy.

@ray.remote(num_gpus=0.5, scheduling_strategy="SPREAD")
def gpu_task():
 pass

https://docs.ray.io/en/master/ray-core/tasks/scheduling.html

AwesomeLemon · October 6, 2022, 6:05pm

Thanks, I’ll try the SPREAD strategy and report back.

The reason I’m not setting num_gpus=1 is this: if I wanna train 12 networks on 4 GPUs (I’m in a PBT-like scenario), if num_gpus==1, they’ll be trained in three batches: 4, 4, 4. By setting num_gpus==0.5, I can get two batches: 8, 4. This is faster in terms of wall clock time, because training two networks on a single GPU is not two times slower than training only one network. But it is somewhat slower. So after 8 networks are finished, I want the remaining 4 to be trained as fast as possible, and that means having them on separate GPUs.

(to be clear: there are no actual “batches”, I schedule all networks to be trained simultaneously so that there wouldn’t be any synchronization barriers, but since the networks take approximately the same time to train, it’s easier for me to think in terms of “batches”)

AwesomeLemon · October 13, 2022, 9:09am

Hi,

Unfortunately, the SPREAD strategy didn’t change anything. Here’s a minimal example - I run exactly this code on a machine with 4 GPUs, and the tasks are scheduled only on the first two GPUs (two tasks on each GPU). If I increase the number of tasks, the rest of the GPUs are used, so the GPUs are clearly available to Ray.

Could you advise me on what I should try next?

import time
import torch
import ray

@ray.remote(num_gpus=0.5, scheduling_strategy="SPREAD")
def fun():
    torch.zeros((10, 10)).cuda()
    time.sleep(5)

ray.init()

futures = [fun.remote() for i in range(4)]
print([ray.get(f) for f in futures])

jjyao · October 18, 2022, 5:26pm

Hi @AwesomeLemon,

SPREAD wont’ work in this case since it spread tasks across nodes instead of across gpus on a single node.

Chen_Shen · October 18, 2022, 5:34pm

@AwesomeLemon
Unfortunately, it’s a known behavior quirk when fractional GPU is involved in scheduling. For now, my suggestion is to create placement groups each containing 1 GPU, and schedule task against bundle index.

pg = placement_group([{GPU=1} * 4], strategy="SPREAD")
results = [
  fun.options(
    scheduling_strategy=PlacementGroupSchedulingStrategy(
      placement_group=pg,
      # This is the index from the original list.
      # This index is set to -1 by default, which means any available bundle.
      placement_group_bundle_index=0 # Index of gpu_bundle is 0.
    )
  ).remote() for _ in range(4)
]

Placement Groups — Ray 2.0.0 has a bit more information.

AwesomeLemon · October 26, 2022, 9:08am

Thanks for your responses @jjyao @Chen_Shen

Unfortunately, the solution suggested by @Chen_Shen didn’t work for me, the processes are still not spread across GPUs. I tried tinkering with the solution to no avail (I set placement_group_bundle=i (instead of 0), and I had to specify the number of CPUs in each bundle (task cannot be scheduled otherwise)). Am I missing something?

pg = placement_group([{'CPU': 4, 'GPU': 1}] * 4, strategy="SPREAD")
ray.get(pg.ready())
futures = [fun.options(scheduling_strategy=PlacementGroupSchedulingStrategy(
                       placement_group=pg, placement_group_bundle_index=i)).remote()
           for i in range(4)]
print([ray.get(f) for f in futures])
``

jjyao · October 28, 2022, 3:53am

Hi @AwesomeLemon,

You observations are correct, you need to use placement_group_bundle=i and set the cpu count for each bundle. Each bundle maps to one GPU so i is essentially the GPU index.

AwesomeLemon · October 28, 2022, 8:14am

Hi @jjyao

Good to hear that I did correct changes - but as I mentioned in the previous post, it still didn’t work… Do you maybe have an idea why?
(if it helps, my ray version is 2.0.0)

jjyao · October 28, 2022, 4:54pm

Hi @AwesomeLemon,

You are right. Apparently I also misunderstood how placement group is implemented. It cannot achieve your goal. I’ll check with the team to see if it’s a bug.

At this point, the only way I can think of is doing the spread yourself (assuming you are single node):

@ray.remote(num_gpus=0.5)
def gpu_task(index):
     # override the CUDA_VISIBLE_DEVICES set by Ray
    os.environ['CUDA_VISIBLE_DEVICES'] = str(index % 4)
    # actual gpu work

AwesomeLemon · October 31, 2022, 3:27pm

Hi @jjyao

Thanks for your response, I’d be interested in hearing if this is indeed a bug.

Unfortunately, manual spreading is not an option for me, since the code needs to be able to run on multiple nodes.

jjyao · October 31, 2022, 7:34pm

Yea, I think it’s a bug and I created an issue for it: [Core] GPU placement group doesn't honer the bundle index · Issue #29811 · ray-project/ray · GitHub

jjyao · October 31, 2022, 10:32pm

Is your cluster static? As a workaround for now, you can probably do a two level spread: using NodeAffinitySchedulingStrategy to spread tasks across nodes and manually overriding CUDA_VISIBLE_DEVICES to spread across GPUs in a single node.

Topic		Replies	Views
Can we make ray evenly schedule tasks on different GPUs? Ray Core	3	316	January 11, 2021
Spread trials evenly with fractional gpu resources Ray Tune	2	38	November 8, 2024
Does fractional gpu feature in ray tune really enforce GPU isolation? Ray Tune	1	400	January 23, 2021
Gpu wise memory allocation Ray Tune	0	455	December 16, 2020
How to distribute actors to multiple GPUs Ray Core	6	1158	May 5, 2022

Jobs with fractional GPU usage are not spread across GPUs evenly

Related topics