How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have defined an environment that has the following action space:
self.action_space=Box(low=np.array([0.01,0.01,0,0.01,0.01,1]),high=np.array([25,5,3,1,1,15]))
Then, I used PPO to train an agent on this environment. During training, the bounds are respected and no problems arise.
However, I now wanted to use the policy that resulted from training. To do so, I have the following code:
agent = Policy.from_checkpoint(path_to_checkpoint)['default_policy']
action = agent.compute_single_action(state)[0]
However, this constantly generates actions that fall outside of the predefined bounds. How can I correct this? I tried to change
agent.config["clip_actions"] = True
or doing
action = agent.compute_single_action(state,clip_actions=True)[0]
but the problem was not solved.
Any help is appreciated.
Thanks!