How to distribute actors to multiple GPUs

Hi,
I want to distribute actors to multiple GPUs.
For example, I have 4 GPU and want to spawn 4 actors.
Then, it should be 1 actor per 1 GPU.
It sounds like simple when I can use the argument “num_gpus=1”.
However, I have other actors to be assigned.
So, let’s assume that I have 4 actors to be distributed and another 4 actors that also use GPU.
When I use num_gpus=0.5, then ray creates 2 actors in 1 GPU.
Maybe, if I use num_gpus=0.6 for actors to be distributed, and num_gpus=0.4 for other 4 actors, I can make it.
However, if the number of actors is not fixed, I cannot use the fixed number.

This topic is related to the previous topic ( How to assign a specific actor to a specific GPU ), but I cannot use the accelerator type (all GPUs are same).

Thanks!

cc @Alex Can you answer this question?

Hey, I’m not 100% sure I understand the question, but it sounds like you have a set of actors and you want them to be on different GPU’s?

Do you mind describing why you want to do this? If each actor is only supposed to use part of a GPU, why can’t they be on the same GPU?

@Alex
Hi, in this case, the first group of actors and the second group of actors have different execution time.
Therefore, it is better to distribute actors and let all GPUs have same actor configuration.

Is there any update to this? I would like to do the same thing with either remote functions or actors for a data parallel task. And I’m working in a homogenous GPU setup (8 V100s). I know that Ray was mainly meant to improve task parallelism, but I’m interested in exploring ways to accelerate data parallelism with GPUs, especially with the Ray communication library: Ray Collective Communication Lib — Ray 1.12.0. Even in the examples in the documentation, I find that operations are placed only on 1 GPU when I have multiple GPUs available. Naively doing data parallelism with Ray remote functions will be slower due to serialization/deserialization to object store and data transfers from/to GPU/CPU memory without utilizing NVLink connections, especially when communication is needed in algorithms like distributed DGEMM.

Much appreciated!
Brian Park

cc @matthewdeng do you know the answer for this question?

You can use placement groups to force placement here: Placement Groups — Ray 1.12.0

For example, if you want say, 4 actors to be on the same GPU, then you could create a placement group with a {"GPU": 1} bundle. Then actors can be targeted at the bundle using GPUActor.options(placement_group=pg, placement_group_bundle_index=0). This allows you to force the actors to use the same GPU.

No other actors will be scheduled onto that bundle (it’s reserved) unless explicitly targeted.

Note that the actors still need to fit in the placement group bundle (total num_gpus <= 1 if you create a 1 GPU bundle). Hope this helps.

1 Like