Env worker expiry time


I have a simulator that seems to have a memory leak and I get OOM errors during long trainings.

Is there a mechanism to set a limit for the env worker in terms of max episodes or max iters and then just recreate it from scratch?


You could try stopping the agent when it reaches X timesteps in the config: ray/atari-impala-large.yaml at master · ray-project/ray · GitHub

Thanks, but I’m not trying to stop the training with timesteps_total but rather I’m trying to set a limit to the amount of time an env worker is used during a long training.

Hi @vakker00,

Not as far as I have been able to find. I had this same issue for a long time. Finally I just had to find the memory leak and fix it.

I did try writing a callback that would restart the environment after so many episodes. That did not work great because I had many workers all using the same environment. I did not have independent instances for each worker instead they all interacted with a central application through websockets.

Can you share more info about your environment and how it is set up.

Another approach you could try is to set it up as an external env. In that setup the trainer does not control the workers it is purely a consumer of samples so you could implement restart functions in your polocy client.

Thanks for the suggestion. I just ended up explicitly deleting the wrapped simulator after a certain number of episodes, i.e. in my env.reset there’s an if self.episodes % threshold == 0 then delete (and force GC). That’s a bit dirty, but seems to do the trick until the memory leak is fixed.