Run DD-PPO in multiple GPUs

Fot · June 17, 2021, 10:47pm

Hi all!

I am trying to run the DD-PPO algorithm, providing 1 GPU/worker, using this configuration file:
https://github.com/ray-project/ray/blob/master/rllib/tuned_examples/ppo/atari-ddppo.yaml
I have a machine with 4 GPUs, thus I set num_workers: 4.

The problem is, that all 4 workers use 1 GPU, instead of using 1 each.
When I print the resources for the workers, I get the following output for each of them:

{‘CPU_group_05d7394261ee785f26194bc7e08bb664’: [(0, 1.0)], ‘GPU_group_05d7394261ee785f26194bc7e08bb664’: [(0, 1.0)]}

I understand that Ray then sets CUDA_VISIBLE_DEVICES=0 for all workers, and this might be why they all use GPU 0.

Why is the GPU_group showing always GPU 0, for each worker? Is this expected behavior?

Thanks!

michaelzhiluo · July 1, 2021, 7:31am

Yes, this is expected behavior. It is the models that get distributed across GPUs.

jiyoonlim123 · September 30, 2021, 3:39am

Then, how to distribute multiple workers to multiple GPUs?

Topic		Replies	Views
Does DDPPO example code uses GPU? Ray Tune	1	386	October 26, 2021
How to let all GPUs visible for each worker Ray Core	9	738	October 13, 2021
Total Workers == (Number of GPUS) - 1? Configure Algorithm, Training, Evaluation, Scaling	1	1233	February 9, 2023
How do I set GPU affinity of workers RLlib	17	2516	April 23, 2021
When I convert PPO to DDPPO in rllib for distributed training, it prompts: RuntimeError: No CUDA GPUs are available RLlib	5	668	February 21, 2023

Run DD-PPO in multiple GPUs

Related topics