I have some problem how to define action space and observation spce when I use masking.
in _init section there is:
self.observation_space = Dict(
{
"action_mask": Box(0, 1, (3,), ),
"actual_obs": Box(0, 2, (4,), ),
}
)
return in reset and step methods is for example:
self.observation={
"action_mask":np.array([1,1,1]) ,
"actual_obs": np.array([1.5,1.5,1.5,1.0]),
}
Error is:
ray.rllib.utils.error.EnvError: Env's `observation_space` Dict(action_mask:Box([0. 0. 0.], [1. 1. 1.], (3,), float32), actual_obs:Box([0. 0. 0. 0.], [2. 2. 2. 2.], (4,), float32)) does not contain returned observation after a reset ({'action_mask': array([1, 1, 1]), 'actual_obs': array([1.5, 1.5, 1.5, 1. ])})!
I will be grateful for any sugestions.