PPO with PyTorch backend slow on GPU for Ray 1.0

psxz · August 8, 2021, 7:43pm

Hi, I was told that when using the Pytorch backend for PPO, on a GPU, the training batch is copied multiple times (number of minibatch updates) to the GPU, thereby slowing the learning process. I just wanted to know whether the latest version of RLlib has fixed the issue.

michaelzhiluo · August 12, 2021, 8:50am

In PPO, doing minibatch passes over the training data occurs here: ray/train_ops.py at 3e010c5760c99be5a9940001f33db087c52eb8e7 · ray-project/ray · GitHub

Based on the code, it looks like the batch is already loaded onto the GPU.

psxz · August 12, 2021, 12:47pm

Thank you, the latest version of Rllib does seem to have fixed that problem.

sven1977 · August 12, 2021, 1:21pm

Hey @psxz , yes, we fixed that problem in the latest master. Now, all torch algos use the unified multi-GPU exec. op (that was before only available for tf).
This change will be included in the upcoming 1.6 ray release.

We measured an speed increase for PPO + 1GPU on Atari of roughly 35% because of this change.

psxz · August 12, 2021, 4:51pm

Thank you, that will be quite helpful.

Topic		Replies	Views
PPO Training takes double the time of CPU on GPU RLlib	2	1577	June 4, 2022
[rllib] Performance of PPO with two gpus is worse than using only one gpu RLlib	1	440	January 3, 2022
PPO example cannot use GPU RLlib	4	496	August 7, 2021
PPO with PyTorch GPU has a RAM memory leak for Ray 1.6.0 RLlib	5	669	October 5, 2021
GPU Detected but Not Utilized in Ray RLlib with PPO RLlib	1	618	June 15, 2024

PPO with PyTorch backend slow on GPU for Ray 1.0

Related topics