Hi,
Say I’m running some A3C agents on an arbitrary environment usign tune.run().
When I see the reward over time steps graphs I see that there are a lot of timesteps (10M) but how do I know when the environment had been reset if at all?
If I don’t call env.reset() in my code - does the agents take steps in the environment until they’re done?
Generally an RLLib worker calls the reset() method of it’s environment instance after the environment returns a done=True from the step() call. The dones are part of the experiences that your workers produce. You can access these experiences and log them if you are interested. Otherwise they are accumulated in their own metrics: episodes_this_iter !
@Ofir_Abu there is also a horizon key in the config. If you set that to an integer n then rllib will artificially end the episode after that many steps of the environment, store a done=True, and call reset.