What is the proper way to deal with varying observation space?

pengzh · April 13, 2021, 6:01pm

Hi there! I am working on an environment that some agents are dying.

For example, at the t=0, there might be two agents: agent0, agent1.

Later, some agents might die, so only parts of the agent is alive: agent1.

There could also be some new agents being added: agent1, agent2, agent3.

So the observation space as well as the action space is always changing.

However, the check_shape function in preprocessor always breaks the training.

Does anyone have some idea on this issue? Thanks!!

sangcho · April 13, 2021, 6:03pm

cc @sven1977 Can you take a look at this question?

pengzh · April 13, 2021, 6:08pm

A workaround is to use customize preprocessor. But I think better ideas are to (1) provide a config to disable shape check, (2) collect latest observation_space / action_space from runtime environment.

I remember in very old version of rllib, say v0.6, this issue does not exist and I can just change obs,reward,done,info whatever I like — only with the constrains that the keys in those dict are aligned with each other.

sven1977 · April 14, 2021, 7:38am

Hey @pengzh , which algorithm are you using? QMIX?
If you are using a non-MARL specific algo (using our multiagent API), your observation space should not depend on how many agents are present in the episode, but should be an individual agent’s space.

pengzh · April 14, 2021, 5:15pm

Hey Sven! I am just using independent PPO (with shared policy name, so there is only one policy).

I do find that there is a place to create remote worker with (policy_cls, env.observation_space, env.action_space, config) and that’s the reason that causes my issue.

I think, since the config['multiagent']['policies'] config should already specify the obs space for each policy, rllib should not query env.observation/action_space to create policies in remote worker. This is problematic, because the environment might have different “kinds” of agent and they might not have same spaces.

sven1977 · April 19, 2021, 7:04pm

Hey @pengzh I see your problem. But the thing is that a Policy can only handle one observation space and if you have different agents using the same Policy, but having different obs spaces, that’ll not work.
The main reason is that the Policy’s model(s) would have to have different first layer weight matrices to handle the different input formats.

pengzh · April 20, 2021, 1:26am

Hi @sven1977 , my point is that, in this line:

github.com

ray-project/ray/blob/4f66309e1940ceadbb8397b0c99fab7a5c2140d6/rllib/evaluation/rollout_worker.py#L1216


    return policy
elif not issubclass(policy, Policy):
    raise ValueError("policy must be a rllib.Policy class")
else:
    if (isinstance(env, MultiAgentEnv)
            and not hasattr(env, "observation_space")):
        raise ValueError(
            "MultiAgentEnv must have observation_space defined if run "
            "in a single-agent configuration.")
    if env is not None:
        return {
            DEFAULT_POLICY_ID: (policy, env.observation_space,
                                env.action_space, {})
        }
    else:
        return {
            DEFAULT_POLICY_ID: (policy, spaces[DEFAULT_POLICY_ID][0],
                                spaces[DEFAULT_POLICY_ID][1], {})
        }

RLLib has made an assumption that there is only one kind of observation/action space for all policies and it is exactly env.observation_space and env.action_space, if I understand it correctly.

So this hard-coded setting might makes users hard to use multiple spaces.

sven1977 · April 20, 2021, 9:17am

Sorry, I’m still not sure I understand 100%. The code you mention above is only used in the case, where you do not provide a policy dict (and RLlib has to create one automatically (from the env’s spaces)).

But even if you specified a policies dict in your multagent config, it’s not possible to have different agents, which all use the same(!) policy (as per your policy_mapping_fn), have different observation spaces.

Topic		Replies	Views
Different observation space in MultiAgentEnv RLlib	2	737	August 12, 2021
AlgorithmConfig for multi-agent env with different observation/action spaces RLlib	0	262	July 13, 2023
Handling spaces.Dict in Multi-Agent Environment without .shape Attribute Error RLlib	0	208	May 5, 2024
Centralized critic PPO with non-homogenous agents RLlib	0	481	February 27, 2022
MultiAgents type actions/observation space defined in environement RLlib	8	1377	May 10, 2022

What is the proper way to deal with varying observation space?

Related topics