PPO Training takes double the time of CPU on GPU

Siddharth_Jain · June 3, 2022, 5:32pm

Hello everyone,

I am running cartpole-ppo example on ray 1.12.1 and pytorch 1.11.0. I observed significant differences in the learn throughput when I use GPU and CPU. I experimented with different sgd_minibatch_size with and without GPU keeping train_batch_size as 4000 and num_workers as 1. The results are as follows

Mini-bath size	Learn Throughput(num_gpus=1)	Learn Throughput(num_gpus=0)
64	1310	3458
128	2867	70007
256	6159	13731
512	12952	27711
1000	21824	43353

As a result training without GPU is faster than using 1 GPU.

I observed similar behavior on my project where I am using PPO agent with a custom model and a custom environment on ray 1.4.1 with pytorch 1.7.0

Can someone please help identify why training on GPU is slower? What can I do to reduce my training time?

Any help is appreciated!
Thanks in advance!

gjoliver · June 4, 2022, 6:04pm

There is a certain amount of overhead that comes with using GPU, for example, the tensors need to get loaded to and from the GPU.
Therefore the forward and backward computation needs to be significant enough before it’s worth the overhead.
I can reproduce the slow down if you use the default network of a single hidden layer of size 32.
If you simply change the hidden layers to be [1024, 1024, 1024] for example, you will notice that GPU is 2x to 4x faster than CPU.
Hope this helps.

Jimmy · June 4, 2022, 9:05pm

i agree with @gjoliver . RL is also quite cpu-intensive. For the cartpole example, the overhead is not in the NN parts. On the contrast, if in the image-based RL examples, like atari-game, where you might need to use a bigger network like CNN, the GPU is certain to be faster in that case.

Topic		Replies	Views
PPO with PyTorch backend slow on GPU for Ray 1.0 RLlib	4	379	August 12, 2021
Rllib runs UNBELIEVABLY slow on windows, even on a basic cartpole environment RLlib	2	435	November 17, 2021
Training and inference ONLY using GPUs and no CPUs RLlib	7	1912	April 12, 2021
[rllib] Performance of PPO with two gpus is worse than using only one gpu RLlib	1	463	January 3, 2022
RLlib slows down when gpu available but not used RLlib	0	360	April 7, 2021

PPO Training takes double the time of CPU on GPU

Related topics