How to define action space and observation space with masking

Peter_Pirog · May 31, 2022, 8:17pm

I have some problem how to define action space and observation spce when I use masking.

in _init section there is:

        self.observation_space = Dict(
            {
                "action_mask": Box(0, 1, (3,), ),
                "actual_obs": Box(0, 2, (4,), ),
            }
        )

return in reset and step methods is for example:

        self.observation={
            "action_mask":np.array([1,1,1]) , 
            "actual_obs": np.array([1.5,1.5,1.5,1.0]), 
        }

Error is:

ray.rllib.utils.error.EnvError: Env's `observation_space` Dict(action_mask:Box([0. 0. 0.], [1. 1. 1.], (3,), float32), actual_obs:Box([0. 0. 0. 0.], [2. 2. 2. 2.], (4,), float32)) does not contain returned observation after a reset ({'action_mask': array([1, 1, 1]), 'actual_obs': array([1.5, 1.5, 1.5, 1. ])})!

I will be grateful for any sugestions.

kourosh · May 31, 2022, 9:55pm

@Peter_Pirog Looks like the issue is coming from the gym spaces logic itself. (gym/box.py at master · openai/gym · GitHub)
action_mask's dtype is np.int while the corresponding observation_space has type np.float. Once there is a mismatch between the dtypes .contain() method in the Box class returns False which is the thing that RLlib is complaining about . Can you check if your problem is solved if you modify the env to change the type of action_mask from np.int to np.float? You can use x = x.astype("float") for example.

Peter_Pirog · June 1, 2022, 5:08am

@kourosh Thank You for the answer. I will try it

Peter_Pirog · June 1, 2022, 10:03pm

Below sample code which works:
in _init section there is:

        self.observation_space = Dict(
            {
                "action_mask": Box(0, 1, (3,), dtype=int),
                "actual_obs": Box(0, 2, (4,), dtype=np.float32),
            }
        )

return in reset and step methods is for example:

        self.observation = {
            "action_mask": np.array([1, 1, 1], dtype=int),
            "actual_obs": np.array([1.5, 1.5, 1.5, 1.0], dtype=np.float32),
        }

Topic		Replies	Views
Action masking & Dict observation space & 'avail_actions'? Configure Algorithm, Training, Evaluation, Scaling	1	1110	August 4, 2023
Return obs_space in gym.Box format RLlib	1	556	March 6, 2022
Custom environment observation formatting ValueError RLlib	1	1123	March 12, 2021
'Observation for a Box/MultiBinary/MultiDiscrete space should be an np.array, not a Python list.' RLlib	5	543	August 17, 2021
Trainer.compute_action Error with Dict type observation inputs RLlib	4	899	December 12, 2020

How to define action space and observation space with masking

Related topics