How to let all GPUs visible for each worker

daniel · September 23, 2021, 4:53pm

Is there any way that all GPUs can be visible by each ray actor or worker?

Because the current solution on a two GPU machine, such as

@ray.remote(num_gpus=1)
class worker(object):

It can only allow each worker to see 1 GPU, so is there any way that I can let each worker to see both 2 GPUs?
Also If I just set @ray.remote(num_gpus=2), the program cannot launch in the setting of 2 workers, since the whole program request for 4 GPUs in total, which is not available on the current machine.

yic · September 23, 2021, 6:17pm

How about num_gpus=0?

Btw, resource requirement there is only for ray to schedule jobs, it’s not like it’s preventing the user from using the resources, so there is no isolation.

daniel · September 23, 2021, 6:31pm

when set the num_gpus=0, there will be no GPU available when printout the results of ray.get_gpu_ids()

yic · September 24, 2021, 3:19am

Maybe I missed something, but I thought you mean two actors access two gpus, right?
Could you share more detail about how do you plan to use ray.get_gpu_ids()

daniel · September 24, 2021, 3:31am

Here is the application setting.

I want two actors to share two GPUs, at some time point actor1 uses GPU1, actor2 uses GPU2. At another time point, actor1 uses GPU2, and actor2 uses GPU1.

Thanks!

yic · September 24, 2021, 3:41am

Hmmm, sorry I still fail to catch why do you need ray.get_gpu_ids here.

In my mind, you just start two actors,

and in the actor, sometimes, you use actor1 to access gpu1 and actor2 to access gpu2, and sometimes, you use actor1 to access gpu2 and actor2 to access gpu1.

As I mentioned:

So nothing prevents you from doing that. Could you explain more about why you need ray.get_gpu_ids()?

daniel · September 24, 2021, 3:45am

I use ray.get_gpu_ids() to check whether the current actor can properly find the GPU as I expected. for example, whether actor1 can find GPU2 even if it will not use it at the first time point.

yic · September 24, 2021, 3:48am

Got it. @sangcho I think you have more experience than me in this question. Could you take a look?

sangcho · September 24, 2021, 4:08am

Hmm I am not 100% sure if this is possible. cc @rliaw do you know what’s the common way to achieve this.

ericl · October 13, 2021, 1:42am

You can override the visible GPUs manually at the CUDA level by setting os.environ["CUDA_VISIBLE_DEVICES"] = "0,1" for example, or unsetting that variable.

Topic		Replies	Views
Actor running on gpu Ray Core	1	430	August 4, 2022
How to distribute actors to multiple GPUs Ray Core	6	1101	May 5, 2022
How can I assign a ray actor to a specific gpu?	1	58	September 4, 2024
How do Ray actors share a GPU? Ray Core	2	2278	December 15, 2021
Ray worker GPU count if GPU available Ray Core	2	902	August 1, 2022

How to let all GPUs visible for each worker

Related topics