Restored Policy gives action that is out of bound


I used PPO to train the agent and saved the policy using checkpoint. But when I restored the policy and using compute single action function to get action, the policy will output action that out of action spaces. So, I want to know if I did something wrong or there are problems in Rllib.

By the way, trainer.compute single action will not output action that is out of bound.

Have a look at what I wrote over here.
Algorithm unsquashes actions.