OpenCL, NVIDIA and Ray actors

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Ray Core, Python, OpenCL, NVIDIA

This is more a post for the next person that is searching for an answer. I have a program that sets up a OpenCL GPU platform/device/context/queue (using PyOpenCL) for a number of Ray actors. On macOS this worked fine (both with AMD and Apple-Si GPUs), but on Ubuntu with NVDIA GPUs/drivers, while the main python process could see the NVIDIA OpenCL platform, the Ray actors could not see the OpenCL platforms.
This was UNTIL I set:
ray.init( num_gpus=1 , runtime_env={"env_vars":{"CUDA_VISIBLE_DEVICES": "0"}}

This is for a single GPU obviously, adjust the values for num_gpus and "CUDA_VISIBLE_DEVICES" appropriately if needed.

This must be set at ray.init. A previous version of the code set the CUDA_VISIBLE_DEVICES within the creation of the ray actor, but this no longer works.
Note, the main python process/environment did not have CUDA_VISIBLE_DEVICES set, yet it could still initiate the OpenCL platforms.

Unclear if this is the same on Windows with NVIDIA, but little harm in setting it regardless of platform/GPU type near as I can tell.
Anyway – took me a while to debug. Hoping to save someone in the future.

2 Likes