Resuming from checkpoint with DQN and epsilon greedy let timesteps start from 0 again

RaymondK · September 5, 2022, 11:32am

High: It blocks me to complete my task.

I also posted a github issue about this: [RLlib] Resuming from checkpoint with DQN and epsilon greedy let timesteps start from 0 again · Issue #28289 · ray-project/ray · GitHub

Copy paste from the github issue:
I start my code with a DQN algorithm and epsilon greedy with resume=“AUTO” and checkpointing each iteration. Then I let it run for say 5 iteration stop it and start it again which resumes the checkpoint but the epsilon starts at 1 again because the timesteps start from 0 again.
In my experiments the performance also completely collapses and it looks like it starts from scratch.
I have checked the same with PPO and there the problem doesn’t occur.

The expected behavior is that the timesteps will not begin by 0 but by for example 5000 (if 1000 timesteps per iteration and saved at iteration 5). So that epsilon also not start at 1 but will be set to the correct value corresponding to this timestep.

See github issue for a reproduction script.

Is there maybe any workaround or quick fix I could apply to solve this problem or any other idea how to solve this problem?

arturn · September 5, 2022, 7:43pm

Hi! This is indeed a bug and I’ve triaged on github. Please check master in the coming days.

Topic		Replies	Views
RLlib DQN Trainer Evaluate Function Help RLlib	1	321	August 22, 2022
Restroing Checkpoint Does Not Include Target Net RLlib	0	248	January 2, 2021
Dqn algo epsilon not logged RLlib	3	358	December 1, 2022
DQN in RLlib not leading to the same results as Vanilla PyTorch Implementation Configure Algorithm, Training, Evaluation, Scaling	0	342	June 21, 2023
Understanding the Stopping Process for ray.rllib.agents.dqn.DQNTrainer.train() RLlib	4	593	May 26, 2021

Resuming from checkpoint with DQN and epsilon greedy let timesteps start from 0 again

Related topics