Hello all! I’m running an experiment using PPO and
tune.grid_search (no search alg) that, when all is said and done, will result in 60 runs, each with about 140 PPO iterations/4M timesteps. However, I’m running into an issue where my Ray object store gets full (“The local object store is full of objects that are still in scope and cannot be evicted”).
Upon checking the Ray Dashboard, I noticed that most of it was coming from
trial_runner.py:_process_events:560 (Ray 1.2.0). Please see the screenshot below.
I’ve so far increased the Plasma object store to 400 GB using
ray.init(object_store_memory), but it’s getting full again. Would anyone happen to have any idea as to what might be preventing the TrialRunner results from being evicted?