Hi, I was told that when using the Pytorch backend for PPO, on a GPU, the training batch is copied multiple times (number of minibatch updates) to the GPU, thereby slowing the learning process. I just wanted to know whether the latest version of RLlib has fixed the issue.
In PPO, doing minibatch passes over the training data occurs here: ray/train_ops.py at 3e010c5760c99be5a9940001f33db087c52eb8e7 · ray-project/ray · GitHub
Based on the code, it looks like the batch is already loaded onto the GPU.
Thank you, the latest version of Rllib does seem to have fixed that problem.
Hey @psxz , yes, we fixed that problem in the latest master. Now, all torch algos use the unified multi-GPU exec. op (that was before only available for tf).
This change will be included in the upcoming 1.6 ray release.
We measured an speed increase for PPO + 1GPU on Atari of roughly 35% because of this change.
Thank you, that will be quite helpful.