Expected RAM usage for PPOTrainer (debugging memory leaks)

I’m using the latest Ray (1.11) and noticed the PPO trainer takes a fairly huge amount of ram, roughly 4 gb per worker with some apparent memory leak (training always inevitably crashes due to OOM about 500-600k environment stpes in). Does anyone have suggestions for debugging this? My rollout buffer size is 4096, my observation sizes are an RGBD image and a 2d birds eye map with two channels (128 x 128 x 3: float32, 128 x 128 x 1: float 32, 128 x 128 x 2: float32).

Can you share the script that you use to launch your experiment? This would be a good starting place in helping you out.

If you don’t want to share your environment, you can substituted it for the RLlib random env :

If you are ok downloading our assets (please message me on slack if you have an issue), I made this reproduction:

I’ll try to reproduce on random env.

This takes up 18 gb roughly for me, does that sound about right?

thanks for sharing I’ll take a look