Hi,
I used PPO to train the agent and saved the policy using checkpoint. But when I restored the policy and using compute single action function to get action, the policy will output action that out of action spaces. So, I want to know if I did something wrong or there are problems in Rllib.
By the way, trainer.compute single action will not output action that is out of bound.