Hello,
I’m running PPO with custom env. The training is running perfectly, I checked it predicted action within the action space.
I save a checkpoint every iteration. But when I load it, the action predicted by the loaded policy using policy.from_checkpoint are ranging from -1 to 1 where the action space should be between 0-30.
is there any postprocessing i’m missing ?
help please