I am new to RLLib and Ray. I have a simulation environment that must be stepped independent of RLLib. I have implemented it by inheriting from ExternalEnv. In the beginning I determine when an episode should be started. When it does, I call start_episode, then periodically call get_action and log_rewards as needed and finally call end_episode with that episode id. Multiple episodes can be ongoing at the same time. So far, so good. Now, I need to evaluate the trained policy at episodes that start at fixed times.
To accomplish this, I enabled evaluation from the config. It initialized the environment in a separate thread, and immediately called “run” on it. Since I have the episode start times predetermined, the start_episode and get_action calls happened before the trainer completed enough training to start evaluation. Naturally, this caused the eval worker to crash with queue._Empty exception.
My question is, what is the best way to do this. One workaround would be to catch the exception, wait and retry. Is there a better way? Ideally, I would like to call “run” on my ExternalEnv only after evaluation has been started by the trainer process, and do this repeatedly whenever it needs to evaluate.