PPO with PyTorch GPU has a RAM memory leak for Ray 1.6.0

psxz · September 6, 2021, 1:48pm

Hi,

I am seeing a steady increase in RAM usage while training with multi-agent PPO on a GPU with Ray 1.6. The GPU memory usage is stable.
I am also running the same training on CPU with Ray 1.0.0 and have no memory issues.

Any ideas for a possible solution will be appreciated.

Thank you,
priyam

alipi · September 14, 2021, 11:27am

Seen similar leak with Ray 1.6 Torch SAC with replay buffer set to 500k:
ray_torch_sac_500K_buffer_leak

Eventually I get (15 GB machine):

PID    MEM    COMMAND
11076  7.48GiB ray::SAC.train()
11077  4.34GiB ray::RolloutWorker

psxz · September 14, 2021, 12:05pm

Are you also using a GPU?

alipi · September 15, 2021, 6:50pm

Are you also using a GPU?

Yes.

alipi · October 5, 2021, 8:21am

Follow up, in my case the environment render was leaking. I haven’t tested ray after the fix, so don’t know if there was an additional leak in ray.

psxz · October 5, 2021, 4:04pm

Thank you. I do not think that is the case for me since it is working fine without GPU.

Topic		Replies	Views
Expected RAM usage for PPOTrainer (debugging memory leaks) RLlib	10	953	September 15, 2022
[RLlib][Tune] Major memory leak 80GB (!) in 3 days (!) RLlib	1	340	June 3, 2021
RAM issue in Ray Ray Core	4	891	January 16, 2021
Memory Leak when training PPO on a single agent environment RLlib	15	1643	December 24, 2022
Ray PPO :: Memory keeps increasing Ray Core	1	502	March 18, 2021

PPO with PyTorch GPU has a RAM memory leak for Ray 1.6.0

Related topics