I am using PPO with two gpus on my task. Interestingly and also sadly :(, the task can be solved on either gpu 0 or gpu 1. But when I use two gpus together, the learning would fail.
I think the problem is possibly due to the latest update of rllib which speeds up with multiple gpus ([RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU).
So, I am wondering if anyone is struggling with the same problem or if some of you have trained sucessfully with two gpus.
Sorry for that my personal task is a little complicated to share. I can provide more debug information if needed.
Thanks in advance.