Resuming from checkpoint with DQN and epsilon greedy let timesteps start from 0 again

  • High: It blocks me to complete my task.

I also posted a github issue about this: [RLlib] Resuming from checkpoint with DQN and epsilon greedy let timesteps start from 0 again · Issue #28289 · ray-project/ray · GitHub

Copy paste from the github issue:
I start my code with a DQN algorithm and epsilon greedy with resume=“AUTO” and checkpointing each iteration. Then I let it run for say 5 iteration stop it and start it again which resumes the checkpoint but the epsilon starts at 1 again because the timesteps start from 0 again.
In my experiments the performance also completely collapses and it looks like it starts from scratch.
I have checked the same with PPO and there the problem doesn’t occur.

The expected behavior is that the timesteps will not begin by 0 but by for example 5000 (if 1000 timesteps per iteration and saved at iteration 5). So that epsilon also not start at 1 but will be set to the correct value corresponding to this timestep.

See github issue for a reproduction script.

Is there maybe any workaround or quick fix I could apply to solve this problem or any other idea how to solve this problem?

Hi! This is indeed a bug and I’ve triaged on github. Please check master in the coming days. :slight_smile:

2 Likes