How to stop Ray from managing CUDA_VISIBLE_DEVICES?

Architect · December 23, 2022, 12:21pm

I’m using Ray on a cluster with a variable number of GPUs per node. I want to run a task on each node, and let it consume all the GPUs on that node. I have defined a custom node resource to make sure tasks are not ran in parallel on the same node. However, if I don’t set num_gpus for the remote function, Ray sets CUDA_VISIBLE_DEVICES to an empty string. So I’m forced to proved some num_gpus value, which leaves some nodes underutilized.
Could someone with any of these questions:

Can I specify a flexible number of GPUs?
or
How to stop Ray from editing CUDA_VISIBLE_DEVICES?

Thank you for your help!

cade · March 24, 2023, 5:59pm

Interesting case of resource requests on non-homogonous cluster cc @jjyao

@Architect not sure what you ended up going with but one workaround is to unset CUDA_VISIBLE_DEVICES at the beginning of task execution.

import os

@ray.remote
def task():
    del os.environ['CUDA_VISIBLE_DEVICES']
    # remaining code that uses all available GPUs on the node

Yard1 · March 24, 2023, 6:13pm

I think we have the RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES env var for that (set it to a non-null value in both the driver code and the runtime env)

Jules_Damji · March 24, 2023, 11:46pm

Thanks @Yard1 and @cade for your suggestions. @Architect Let us know if either of the suggestions worked for you to disable cuda visibility of devices.

Architect · April 13, 2023, 7:17pm

Hi all, thanks for your help. Indeed I ended up manually managing CUDA_VISIBLE_DEVICES. Did not know about RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES, this could have helped.

Topic		Replies	Views
Restrict GPUs to use Ray Tune	1	332	February 17, 2021
How to specify gpu on cluster Ray Core	2	444	March 15, 2021
How to specify GPU resources in terms of GPU RAM and not fraction of GPU Ray Core	3	590	November 26, 2021
How do you disable gpu from being utilized? RLlib	2	710	August 12, 2022
Using specific GPUs in a shared machine Ray Tune	6	2921	March 24, 2022

How to stop Ray from managing CUDA_VISIBLE_DEVICES?

Related topics