Is it possible to run Ray only using GPUs and no CPUs?

Is there any way to run ray without CPU usage, I mean without running operations on the CPUs of the system? If I set ray.init(num_cpus=0) this doesn’t avoid CPU usage. I want to evaluate model training and inference and although I set all CPU resources configurations so that the number of CPUs is 0, I continue seeing how Ray displays processes on all the CPUs of the system that I’m using. So is there any way to completly avoid CPU usage when running ray and run all processes only on GPUs?

Thanks in advance

1 Like

Hey javigm98,

Thanks for reaching out! There is some amount of CPUs that Ray will always use, since some of the background processes (Raylet, GCS, monitors) always run on CPUs. This useage however is pretty small and is needed to coordinate the execution of the program.

Now concerning your actual program (the task or actors): If a task has num_gpus > 0 it means that Ray will set CUDA_VISIBLE_DEVICES, which will instruct the deep learning library used in the task (if any) to use GPU. If there is lots of pure python code in the task or the task is not using a deep learning library (or otherwise GPU enabled library), it will still use CPUs. You can find more information about this in GPU Support — Ray v2.0.0.dev0.

The num_cpus=0 flag to ray.init will make sure Ray will not schedule CPU tasks (for example tasks have @ray.remote(num_cpu=1) by default). Tasks with num_cpu=0 will be scheduled, but those tasks may still use some CPU even if you set @ray.remote(num_gpu=1) if it is not offloading the computation to CUDA.

I hope this makes things clearer!


1 Like

Hi @pcmoritz and thank you so much for your answer. So, to make everything clear, if I want to run a training in RLlib with Tensorflow as underlying framework with the lowest CPU usage possible, the key point is to start ray by setting the flag num_gpus=0 and later configure the agent by setting resources keys num_cpus_for_driver=0 and num_cpus_per_worker=0 (And setting num_gpus and num_gpus_per_worker not to be 0). I mean, I only use the RLlib training API and I don’t know if these settings are enough to reduce to the minimal the CPU usage, since I can’t add decorators @ray.remote.

PD: To put everything in context what I want to do is to train a PPO agent with TF as underlying framework and reducing as maximum as possible the CPU usage, in order to compare performance when training with low CPU usage and GPUs and when training with no GPUs.

Thanks in advance!