[rllib] Performance of PPO with two gpus is worse than using only one gpu

Shanchao_Yang · August 31, 2021, 1:52pm

Hello,

I am using PPO with two gpus on my task. Interestingly and also sadly :(, the task can be solved on either gpu 0 or gpu 1. But when I use two gpus together, the learning would fail.

I think the problem is possibly due to the latest update of rllib which speeds up with multiple gpus ([RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU).

So, I am wondering if anyone is struggling with the same problem or if some of you have trained sucessfully with two gpus.

Sorry for that my personal task is a little complicated to share. I can provide more debug information if needed.

Thanks in advance.

hossein836 · January 3, 2022, 8:19am

I have same problem. wonder if you find any solutions?

Topic		Replies	Views
PPO with PyTorch backend slow on GPU for Ray 1.0 RLlib	4	364	August 12, 2021
Reproducibility of training Results on PPO algorithm RLlib	4	461	September 24, 2021
Does rllib support multi-gpu plus multi-cpu training? Configure Algorithm, Training, Evaluation, Scaling	2	661	March 29, 2024
PPO is using too much GPU memory RLlib	3	1776	July 28, 2021
Run DD-PPO in multiple GPUs RLlib	2	361	September 30, 2021

[rllib] Performance of PPO with two gpus is worse than using only one gpu

Related topics