AlgorithmConfig for multi-agent env with different observation/action spaces

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I’m trying to make my marl environment with different observation/action spaces work. I’m getting the error

observation_space not provided in PolicySpec for "
f”{pid} and env does not have an observation space OR "
"no spaces received from other workers’ env(s) OR no "
observation_space specified in config!”

from the AlgorithmConfig.get_multi_gent_setup function.

I’ve looked at the code of this function and see that the case of different observations is treated starting from line 2772:

            elif env_obs_space is not None:
                # Multi-agent case AND different agents have different spaces:
                # Need to reverse map spaces (for the different agents) to certain
                # policy IDs.
                if (
                    isinstance(env, MultiAgentEnv)
                    and hasattr(env, "_obs_space_in_preferred_format")
                    and env._obs_space_in_preferred_format
                ):

The problem is that the conditions above are not satisfied as env_obs_space is None.

And env_obs_space is None because earlier in the code it is created only if env’s observation space is gym.Space which is true only if observation spaces are identical for different agents (line 2733 of algorithm_config.py):

    elif env is not None:
        if hasattr(env, "observation_space") and isinstance(env.observation_space, gym.Space):
            env_obs_space = env.observation_space

        if hasattr(env, "action_space") and isinstance(env.action_space, gym.Space):
            env_act_space = env.action_space

However, in my case the env’s observation space is a dictionary of gym.Space. I would be grateful for any suggestions how to workaround this issue with different observations/actions spaces for different agents.