I have been using tune.run, and it has been working fine, but as of this morning, it gets ‘stuck’ after very few episodes - even though my stopping criterion has not changed. So an example would be, even though it still ‘appears’ to be training, the results show:
It looks like your environment is not terminating.If I remember correctly, the episode values (rew, len, etc…) are not logged until an episode ends.Without knowing more about your environment it is not possible to diagnose but it looks like the agent’s are in a state where they are not triggering the termination conditions for your environment.
Have you looked in tensorboard to see how the losses are behaving? Is it possible that the agents could end up in a buggy state in the environment were it cannot end? Is it a case where one agent gets a large negative reward when the environment finishes and it has learned a behavior that prevents the other agent from winning to prevent this. Can you “watch” what the agent’s are doing in an environment?
There is an rllib config entry you could use, “horizon” that will cause rllib to artificially terminate your episode after a max number of steps if that is desired. You can read more about it here : RLlib Sample Collection and Trajectory Views — Ray v2.0.0.dev0