Hey all! I am training RL agents in an environment with a “curriculum” variable tracking the training timestep. As I run my simulations on a cluster where my jobs can get killed, I’d like to resume my runs. To set the right timestep in the environment variable after resuming, I am using a callback that, at every episode step, sets this environment variable to the global_timestep variable of the RolloutWorker (worker.policy_map[‘my_agent’].global_timestep). However, it seems that this global timestep variable is properly set only after the weights are updated. As a result, after resuming runs, the timesteps variable restarts from 0 until weights are updated and, in evaluation, it is never set to its proper value. Is there a way to sync workes (also the evaluation ones!) right after the restart, or maybe a better solution to my problem? Thank you!!!
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Number of training steps inside a gym env | 0 | 210 | August 5, 2022 | |
When does an environment reset()? | 5 | 1490 | February 7, 2023 | |
How to pass information from agent to env | 2 | 281 | October 13, 2021 | |
[RLlib] Timesteps total gets reset everytime 'num_healthy_workers' goes down | 1 | 249 | December 30, 2020 | |
[Tune] Timesteps total gets reset everytime ‘num_healthy_workers’ goes down | 2 | 227 | January 5, 2021 |