How to define action space and observation space with masking

I have some problem how to define action space and observation spce when I use masking.

in _init section there is:

        self.observation_space = Dict(
            {
                "action_mask": Box(0, 1, (3,), ),
                "actual_obs": Box(0, 2, (4,), ),
            }
        )

return in reset and step methods is for example:

        self.observation={
            "action_mask":np.array([1,1,1]) , 
            "actual_obs": np.array([1.5,1.5,1.5,1.0]), 
        }

Error is:

ray.rllib.utils.error.EnvError: Env's `observation_space` Dict(action_mask:Box([0. 0. 0.], [1. 1. 1.], (3,), float32), actual_obs:Box([0. 0. 0. 0.], [2. 2. 2. 2.], (4,), float32)) does not contain returned observation after a reset ({'action_mask': array([1, 1, 1]), 'actual_obs': array([1.5, 1.5, 1.5, 1. ])})!

I will be grateful for any sugestions.

@Peter_Pirog Looks like the issue is coming from the gym spaces logic itself. (gym/box.py at master · openai/gym · GitHub)
action_mask's dtype is np.int while the corresponding observation_space has type np.float. Once there is a mismatch between the dtypes .contain() method in the Box class returns False which is the thing that RLlib is complaining about . Can you check if your problem is solved if you modify the env to change the type of action_mask from np.int to np.float? You can use x = x.astype("float") for example.

1 Like

@kourosh Thank You for the answer. I will try it :slight_smile:

Below sample code which works:
in _init section there is:

        self.observation_space = Dict(
            {
                "action_mask": Box(0, 1, (3,), dtype=int),
                "actual_obs": Box(0, 2, (4,), dtype=np.float32),
            }
        )

return in reset and step methods is for example:

        self.observation = {
            "action_mask": np.array([1, 1, 1], dtype=int),
            "actual_obs": np.array([1.5, 1.5, 1.5, 1.0], dtype=np.float32),
        }
1 Like