Hello everyone,
I am running cartpole-ppo example on ray 1.12.1 and pytorch 1.11.0. I observed significant differences in the learn throughput when I use GPU and CPU. I experimented with different sgd_minibatch_size with and without GPU keeping train_batch_size as 4000 and num_workers as 1. The results are as follows
Mini-bath size | Learn Throughput(num_gpus=1) | Learn Throughput(num_gpus=0) |
---|---|---|
64 | 1310 | 3458 |
128 | 2867 | 70007 |
256 | 6159 | 13731 |
512 | 12952 | 27711 |
1000 | 21824 | 43353 |
As a result training without GPU is faster than using 1 GPU.
I observed similar behavior on my project where I am using PPO agent with a custom model and a custom environment on ray 1.4.1 with pytorch 1.7.0
Can someone please help identify why training on GPU is slower? What can I do to reduce my training time?
Any help is appreciated!
Thanks in advance!