How to distribute actors to multiple GPUs

kyunghyun.lee · December 3, 2020, 6:11am

Hi,
I want to distribute actors to multiple GPUs.
For example, I have 4 GPU and want to spawn 4 actors.
Then, it should be 1 actor per 1 GPU.
It sounds like simple when I can use the argument “num_gpus=1”.
However, I have other actors to be assigned.
So, let’s assume that I have 4 actors to be distributed and another 4 actors that also use GPU.
When I use num_gpus=0.5, then ray creates 2 actors in 1 GPU.
Maybe, if I use num_gpus=0.6 for actors to be distributed, and num_gpus=0.4 for other 4 actors, I can make it.
However, if the number of actors is not fixed, I cannot use the fixed number.

This topic is related to the previous topic ( How to assign a specific actor to a specific GPU ), but I cannot use the accelerator type (all GPUs are same).

Thanks!

sangcho · December 3, 2020, 6:16am

cc @Alex Can you answer this question?

Alex · December 3, 2020, 6:30am

Hey, I’m not 100% sure I understand the question, but it sounds like you have a set of actors and you want them to be on different GPU’s?

Do you mind describing why you want to do this? If each actor is only supposed to use part of a GPU, why can’t they be on the same GPU?

kyunghyun.lee · December 3, 2020, 7:41am

@Alex
Hi, in this case, the first group of actors and the second group of actors have different execution time.
Therefore, it is better to distribute actors and let all GPUs have same actor configuration.

briancpark · April 28, 2022, 6:37am

Is there any update to this? I would like to do the same thing with either remote functions or actors for a data parallel task. And I’m working in a homogenous GPU setup (8 V100s). I know that Ray was mainly meant to improve task parallelism, but I’m interested in exploring ways to accelerate data parallelism with GPUs, especially with the Ray communication library: Ray Collective Communication Lib — Ray 1.12.0. Even in the examples in the documentation, I find that operations are placed only on 1 GPU when I have multiple GPUs available. Naively doing data parallelism with Ray remote functions will be slower due to serialization/deserialization to object store and data transfers from/to GPU/CPU memory without utilizing NVLink connections, especially when communication is needed in algorithms like distributed DGEMM.

Much appreciated!
Brian Park

sangcho · May 5, 2022, 2:28am

cc @matthewdeng do you know the answer for this question?

ericl · May 5, 2022, 9:05pm

You can use placement groups to force placement here: Placement Groups — Ray 1.12.0

For example, if you want say, 4 actors to be on the same GPU, then you could create a placement group with a {"GPU": 1} bundle. Then actors can be targeted at the bundle using GPUActor.options(placement_group=pg, placement_group_bundle_index=0). This allows you to force the actors to use the same GPU.

No other actors will be scheduled onto that bundle (it’s reserved) unless explicitly targeted.

Note that the actors still need to fit in the placement group bundle (total num_gpus <= 1 if you create a 1 GPU bundle). Hope this helps.

Topic		Replies	Views
Spread accross several fractional GPUs or 1< num_gpus < 2 Ray Core	1	338	February 13, 2024
How do Ray actors share a GPU? Ray Core	2	2278	December 15, 2021
Actor running on gpu Ray Core	1	430	August 4, 2022
How to let all GPUs visible for each worker Ray Core	9	690	October 13, 2021
How to assign actors to specific machines? Ray Core	2	303	January 8, 2024

How to distribute actors to multiple GPUs

Related topics