Obervation space and action space in multi-agent env

Hey everyone,
i was wondering why the specs about observation and action space specs are provided in some of the multi-agent environment examples. I looked into the FlexAgentsMultiAgent class from the multi_agent_env.py rllib example and it looks to me that the dimensionality of obs, rew, done and info comes from the environment that initializes the agents (in this case MockEnv() ). So what do i need self.observation_space and self.action_space for?

class FlexAgentsMultiAgent(MultiAgentEnv):
    """Env of independent agents, each of which exits after n steps."""

    def __init__(self):
        self.agents = {}
        self.agentID = 0
        self.dones = set()
        self.observation_space = gym.spaces.Discrete(2)
        self.action_space = gym.spaces.Discrete(2)
        self.resetted = False

    def spawn(self):
        # Spawn a new agent into the current episode.
        agentID = self.agentID
        self.agents[agentID] = MockEnv(25)
        self.agentID += 1
        return agentID

    def reset(self):
        self.agents = {}
        self.resetted = True
        self.dones = set()

        obs = {}
        for i, a in self.agents.items():
            obs[i] = a.reset()

        return obs

    def step(self, action_dict):
        obs, rew, done, info = {}, {}, {}, {}
        # Apply the actions.
        for i, action in action_dict.items():
            obs[i], rew[i], done[i], info[i] = self.agents[i].step(action)
            if done[i]:

        # Sometimes, add a new agent to the episode.
        if random.random() > 0.75:
            i = self.spawn()
            obs[i], rew[i], done[i], info[i] = self.agents[i].step(action)
            if done[i]:

        # Sometimes, kill an existing agent.
        if len(self.agents) > 1 and random.random() > 0.25:
            keys = list(self.agents.keys())
            key = random.choice(keys)
            done[key] = True
            del self.agents[key]

        done["__all__"] = len(self.dones) == len(self.agents)
        return obs, rew, done, info

you’re right, the dimensionality of actions, obs, etc. is here derived from the MockEnv.
I think the RL agents that work on the FlexAgentsMultiAgent environment still expect the environment to define self.observation_space and self.action_space.
Here, it’s set to the same dimensions as MockEnv.

Update: The docs for base_env.py say observation_space and action_space attributes are required but just for single-agent environments, not multi-agent.
Maybe it would then still work without the explicit declaration of obs and action space in FlexAgentsMultiAgent. Did you try?

I didn’t test leaving it out yet. They point i was interested in was mainly if it has an influence on the agents since different agents types could potentially have different action and observation spaces so defining it on the top environment (potentially for all of them) didn’t seem right.

@Blubberblub I answer this in some detail in this rllib issue. For MultiAgentEnv, having an observation_space and action_space attached to the environment doesn’t mean anything. The only thing that matters is the observation and action space you define for the policies. As I argue in the GitHub issue, we shouldn’t store the observation and action space in the environment itself because that locks our minds into thinking about multi-agent simulations as always being homogeneous. I have created a whole simulation framework called Abmarl that stores the observation and action spaces for each agent instead of the environment.