[rllib] Performance of PPO with two gpus is worse than using only one gpu

Hello,

I am using PPO with two gpus on my task. Interestingly and also sadly :(, the task can be solved on either gpu 0 or gpu 1. But when I use two gpus together, the learning would fail.

I think the problem is possibly due to the latest update of rllib which speeds up with multiple gpus ([RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU).


So, I am wondering if anyone is struggling with the same problem or if some of you have trained sucessfully with two gpus.

Sorry for that my personal task is a little complicated to share. I can provide more debug information if needed.

Thanks in advance.