Is it possible to run Ray only using GPUs and no CPUs?

javigm98 · April 10, 2021, 3:16pm

Is there any way to run ray without CPU usage, I mean without running operations on the CPUs of the system? If I set ray.init(num_cpus=0) this doesn’t avoid CPU usage. I want to evaluate model training and inference and although I set all CPU resources configurations so that the number of CPUs is 0, I continue seeing how Ray displays processes on all the CPUs of the system that I’m using. So is there any way to completly avoid CPU usage when running ray and run all processes only on GPUs?

Thanks in advance

pcmoritz · April 11, 2021, 3:45am

Hey javigm98,

Thanks for reaching out! There is some amount of CPUs that Ray will always use, since some of the background processes (Raylet, GCS, monitors) always run on CPUs. This useage however is pretty small and is needed to coordinate the execution of the program.

Now concerning your actual program (the task or actors): If a task has num_gpus > 0 it means that Ray will set CUDA_VISIBLE_DEVICES, which will instruct the deep learning library used in the task (if any) to use GPU. If there is lots of pure python code in the task or the task is not using a deep learning library (or otherwise GPU enabled library), it will still use CPUs. You can find more information about this in GPU Support — Ray v2.0.0.dev0.

The num_cpus=0 flag to ray.init will make sure Ray will not schedule CPU tasks (for example tasks have @ray.remote(num_cpu=1) by default). Tasks with num_cpu=0 will be scheduled, but those tasks may still use some CPU even if you set @ray.remote(num_gpu=1) if it is not offloading the computation to CUDA.

I hope this makes things clearer!

Best,
Philipp.

javigm98 · April 11, 2021, 9:35am

Hi @pcmoritz and thank you so much for your answer. So, to make everything clear, if I want to run a training in RLlib with Tensorflow as underlying framework with the lowest CPU usage possible, the key point is to start ray by setting the flag num_gpus=0 and later configure the agent by setting resources keys num_cpus_for_driver=0 and num_cpus_per_worker=0 (And setting num_gpus and num_gpus_per_worker not to be 0). I mean, I only use the RLlib training API and I don’t know if these settings are enough to reduce to the minimal the CPU usage, since I can’t add decorators @ray.remote.

PD: To put everything in context what I want to do is to train a PPO agent with TF as underlying framework and reducing as maximum as possible the CPU usage, in order to compare performance when training with low CPU usage and GPUs and when training with no GPUs.

Thanks in advance!

Topic		Replies	Views
Most efficient way to use only a CPU for training RLlib	3	3058	April 22, 2021
Intentionally not using GPU Ray Core	3	395	February 9, 2022
Training and inference ONLY using GPUs and no CPUs RLlib	7	1819	April 12, 2021
Ray actor with num_cpus=0 Ray Core	11	1955	February 22, 2022
GPU accelarate that can not be used with ray and tune in training PPO RLlib	3	812	December 23, 2023

Is it possible to run Ray only using GPUs and no CPUs?

Related topics