Hi,
When I run training for PPO or APPO on my machine (Ubuntu, 1 15gb GPU and 16 CPU cores with 65GB CPU RAM) I run into out of memory errors late into the training process. I am using the standard API with num_gpus=1
and num_workers=10
. GPU memory remains stable during this time, but after some debugging I can watch each of the rollout workers bloat up from ~2Gb in GPU RAM to well over 5Gb, causing my server to choke and kill the process.
I have not been able to figure out the cause of this yet. I am using a custom environment, but have not seen this issue when training on it with other agents. I also did not have this issue with tf1. Any thoughts?