Hey all! I am training RL agents in an environment with a “curriculum” variable tracking the training timestep. As I run my simulations on a cluster where my jobs can get killed, I’d like to resume my runs. To set the right timestep in the environment variable after resuming, I am using a callback that, at every episode step, sets this environment variable to the global_timestep variable of the RolloutWorker (worker.policy_map[‘my_agent’].global_timestep). However, it seems that this global timestep variable is properly set only after the weights are updated. As a result, after resuming runs, the timesteps variable restarts from 0 until weights are updated and, in evaluation, it is never set to its proper value. Is there a way to sync workes (also the evaluation ones!) right after the restart, or maybe a better solution to my problem? Thank you!!!
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Resuming from checkpoint with DQN and epsilon greedy let timesteps start from 0 again | 1 | 311 | September 5, 2022 | |
| [RLlib] Timesteps total gets reset everytime 'num_healthy_workers' goes down | 1 | 269 | December 30, 2020 | |
| Restore and continue training Tuner() and AIR | 12 | 1316 | November 11, 2022 | |
| Number of training steps inside a gym env | 0 | 225 | August 5, 2022 | |
| Continue training after finishing first run | 3 | 432 | June 14, 2021 |