PPO cur_kl_coeff is not restored


I’m training a PPO agent on an external simulator that may crash on some occasions.
If I try to resume the training with ppo_trainer.restore(checkpoint) I see that cur_kl_coeff is set to its initial value from the config, and not to its value at the time of the checkpoint.

Is there a better way to resume an interrupted training with all parameters at their exact state?
Are there any other parameters (that are not reported on tensorboard) that are not restored by design? (e.g. optimizer state?)


can you file an issue at https://github.com/ray-project/ray/issues with a small reproducible script?
kl_coeff schedule should be restored.
do you use something like KLCoeffMixin?

I opened a new issue #22444.
Sorry it took a while.

I did not use mixin.