Hi,
I’m training a PPO agent on an external simulator that may crash on some occasions.
If I try to resume the training with ppo_trainer.restore(checkpoint)
I see that cur_kl_coeff
is set to its initial value from the config, and not to its value at the time of the checkpoint.
Is there a better way to resume an interrupted training with all parameters at their exact state?
Are there any other parameters (that are not reported on tensorboard) that are not restored by design? (e.g. optimizer state?)
Thanks!