OpenCL, NVIDIA and Ray actors

drowenhorst · August 27, 2024, 7:45pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Ray Core, Python, OpenCL, NVIDIA

This is more a post for the next person that is searching for an answer. I have a program that sets up a OpenCL GPU platform/device/context/queue (using PyOpenCL) for a number of Ray actors. On macOS this worked fine (both with AMD and Apple-Si GPUs), but on Ubuntu with NVDIA GPUs/drivers, while the main python process could see the NVIDIA OpenCL platform, the Ray actors could not see the OpenCL platforms.
This was UNTIL I set:
ray.init( num_gpus=1 , runtime_env={"env_vars":{"CUDA_VISIBLE_DEVICES": "0"}}

This is for a single GPU obviously, adjust the values for num_gpus and "CUDA_VISIBLE_DEVICES" appropriately if needed.

This must be set at ray.init. A previous version of the code set the CUDA_VISIBLE_DEVICES within the creation of the ray actor, but this no longer works.
Note, the main python process/environment did not have CUDA_VISIBLE_DEVICES set, yet it could still initiate the OpenCL platforms.

Unclear if this is the same on Windows with NVIDIA, but little harm in setting it regardless of platform/GPU type near as I can tell.
Anyway – took me a while to debug. Hoping to save someone in the future.

Topic		Replies	Views
Actor running on gpu Ray Core	1	415	August 4, 2022
How to define `num_gpus` in `ray.remote()` while not explicitly adding `@ray.remote` above the target class Ray Core	2	145	April 16, 2024
Intentionally not using GPU Ray Core	3	393	February 9, 2022
How to stop Ray from managing CUDA_VISIBLE_DEVICES? Ray Core	4	1198	April 13, 2023
Dependency management Ray Core	1	283	December 12, 2022

OpenCL, NVIDIA and Ray actors

Related topics