PPO cur_kl_coeff is not restored

2dm · February 2, 2022, 5:47pm

Hi,

I’m training a PPO agent on an external simulator that may crash on some occasions.
If I try to resume the training with ppo_trainer.restore(checkpoint) I see that cur_kl_coeff is set to its initial value from the config, and not to its value at the time of the checkpoint.

Is there a better way to resume an interrupted training with all parameters at their exact state?
Are there any other parameters (that are not reported on tensorboard) that are not restored by design? (e.g. optimizer state?)

Thanks!

gjoliver · February 7, 2022, 1:10am

can you file an issue at https://github.com/ray-project/ray/issues with a small reproducible script?
kl_coeff schedule should be restored.
do you use something like KLCoeffMixin?

2dm · February 16, 2022, 8:25pm

I opened a new issue #22444.
Sorry it took a while.

I did not use mixin.

Topic		Replies	Views
Can't restore trained model when training a new one RLlib	0	240	May 17, 2022
PPO - Load checkpoint from previous version fails RLlib	2	863	March 17, 2022
PPO from checkpoint Checkpointing, Restoring	0	37	September 10, 2024
ValueError when restoring checkpoint with PPO RLlib	1	494	October 20, 2022
Diffrences between the PPO implementation and the origonal PPO paper RLlib	6	861	May 16, 2021

PPO cur_kl_coeff is not restored

Related topics