Worker Timeout and restart

Hello,

I currently have the problem that the RL-Training freezes after 1-3 days. I am unsure where it comes from, but I would like to implement a timeout for the environment. So I got a couple of questions:

  • is there already a timeout implemented in RLLib / Ray that can be used?
  • is it possible to use that timeout to completely close and restart the worker?
  • if not is ray compatible with signals? Or with the timeout-decorator · PyPI

Thanks in advance

Regards
Sebastian