How to let all GPUs visible for each worker

Is there any way that all GPUs can be visible by each ray actor or worker?

Because the current solution on a two GPU machine, such as

@ray.remote(num_gpus=1)
class worker(object):

It can only allow each worker to see 1 GPU, so is there any way that I can let each worker to see both 2 GPUs?
Also If I just set @ray.remote(num_gpus=2), the program cannot launch in the setting of 2 workers, since the whole program request for 4 GPUs in total, which is not available on the current machine.

How about num_gpus=0?

Btw, resource requirement there is only for ray to schedule jobs, it’s not like it’s preventing the user from using the resources, so there is no isolation.

when set the num_gpus=0, there will be no GPU available when printout the results of ray.get_gpu_ids()

Maybe I missed something, but I thought you mean two actors access two gpus, right?
Could you share more detail about how do you plan to use ray.get_gpu_ids()

Here is the application setting.

I want two actors to share two GPUs, at some time point actor1 uses GPU1, actor2 uses GPU2. At another time point, actor1 uses GPU2, and actor2 uses GPU1.

Thanks!

Hmmm, sorry I still fail to catch why do you need ray.get_gpu_ids here.

In my mind, you just start two actors,

and in the actor, sometimes, you use actor1 to access gpu1 and actor2 to access gpu2, and sometimes, you use actor1 to access gpu2 and actor2 to access gpu1.

As I mentioned:

So nothing prevents you from doing that. Could you explain more about why you need ray.get_gpu_ids()?

I use ray.get_gpu_ids() to check whether the current actor can properly find the GPU as I expected. for example, whether actor1 can find GPU2 even if it will not use it at the first time point.

Got it. @sangcho I think you have more experience than me in this question. Could you take a look?

Hmm I am not 100% sure if this is possible. cc @rliaw do you know what’s the common way to achieve this.

You can override the visible GPUs manually at the CUDA level by setting os.environ["CUDA_VISIBLE_DEVICES"] = "0,1" for example, or unsetting that variable.