When does an environment reset()?

Hi,
Say I’m running some A3C agents on an arbitrary environment usign tune.run().
When I see the reward over time steps graphs I see that there are a lot of timesteps (10M) but how do I know when the environment had been reset if at all?

If I don’t call env.reset() in my code - does the agents take steps in the environment until they’re done?

Hi @Ofir_Abu,

Generally an RLLib worker calls the reset() method of it’s environment instance after the environment returns a done=True from the step() call. The dones are part of the experiences that your workers produce. You can access these experiences and log them if you are interested. Otherwise they are accumulated in their own metrics: episodes_this_iter !

Cheers

1 Like

@Ofir_Abu there is also a horizon key in the config. If you set that to an integer n then rllib will artificially end the episode after that many steps of the environment, store a done=True, and call reset.

My code is as below

for _ in range(1): ----------------------------------------------> isnt this the episode number ?
result = agent.train()

Since the episode number is 1 , shouldn’t the reset function be called ONCE ? For me the reset function is called 3-4 times in this 1 episode.

Batch size for debug purposes is 128 .

Can you please share your feedback ?

Hi @Archana_R,

Each call to train collects train_batch_size new steps from the environments.

It will reset the environment as many times as it needs to in order to collect that many steps.

It resets the environment every time it returns done or if horizon is set and it reaches that many steps in the episode.

1 Like

This helps. thanks !