Tune as part of curriculum training

I meant to ask whether all runs well without the checkpoint. But you answered that already!
In order to progress on this, you should cast this into a minimal reproduction script.
I don’t think it has anything to do with your environment and it’s highly unlikely that it has anything todo with almost all of the configuration you do.
Since it is working on my end (saving and loading checkpoints with your script): Could you try to do this inside of one script with a toy environment and no further configuration and see if that runs?

also, sorry about the delay in getting back.
one thing I noticed is that in this callback, can you not reload the whole algorithm?
the algorithm directory has a policies/ sub-dir, and it contains all the policies you have trained with that run of PPO. there is likely a default/ folder in there as well, since that’s the name of the policy if you don’t do multi-agent.
so in the callback, can you do something like Policy.from_checkpoint("<your algorithm checkpoint dir>/policies/default/")?

I’d also like to share an example with you for something like this: https://github.com/ray-project/ray/blob/master/rllib/examples/connectors/self_play_with_policy_checkpoint.py#L54-L68

@gjoliver @arturn thanks for the new suggestions. It appears I was able to get the checkpoint restored successfully by restoring the Policy checkpoint instead of the Algorithm checkpoint. Even with a very simplified script, I still found the Algorithm restoration to hang. But I am now able to move on. Thank you very much.

1 Like

If anyone experiences similar issues, please add to this thread. Thank you.

Hi everyone! I’ve posted two issues according to this topic: RLlib| Algorithm.from_checkpoint doesn't work correctly · Issue #41290 · ray-project/ray · GitHub and [RLlib] Issue with /rllib/rllib-saving-and-loading-algos-and-policies · Issue #40347 · ray-project/ray · GitHub. With the DreamerV3 algorithm I cann’t restore models weights with Algorithm.from_checkpoint(checkpoint_path). After two different restores from the same checkpoint I have different weights. With the PPO algorithm weights are restoring correctly by Algorithm.from_checkpoint(checkpoint_path), but we think, that optimizer parameters are not restoring correctly and we loose all the metrics in 2-5 training iteration steps.

If you have any idea how to restore all parts of the algorithm from checkpoint I\ll be very happy to hear them.

Thank you in advance.


Maybe this post is helpful