- High: It blocks me to complete my task.
I also posted a github issue about this: [RLlib] Resuming from checkpoint with DQN and epsilon greedy let timesteps start from 0 again · Issue #28289 · ray-project/ray · GitHub
Copy paste from the github issue:
I start my code with a DQN algorithm and epsilon greedy with resume=“AUTO” and checkpointing each iteration. Then I let it run for say 5 iteration stop it and start it again which resumes the checkpoint but the epsilon starts at 1 again because the timesteps start from 0 again.
In my experiments the performance also completely collapses and it looks like it starts from scratch.
I have checked the same with PPO and there the problem doesn’t occur.
The expected behavior is that the timesteps will not begin by 0 but by for example 5000 (if 1000 timesteps per iteration and saved at iteration 5). So that epsilon also not start at 1 but will be set to the correct value corresponding to this timestep.
See github issue for a reproduction script.
Is there maybe any workaround or quick fix I could apply to solve this problem or any other idea how to solve this problem?