Multi-agent configuration incompatible with Ray hyperparam tuning

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I’m opening this topic to notify an issue that appears when using tune’s hyperparameter tuning with a multi-agent configuration.

It can be the case that one wants to perform a grid search on some parameters that determine the observation/action space of the environment. It would be very helpful to use tune.grid_search() to automatically start several training for different observation spaces.

However, if we’re in a multi-agent setting, we need to specify the policies key in the multi-agent part of the configuration. This requires specifying beforehand the observation and action space of agents in the environment, thus making it impossible to perform a grid search on them.

This makes me wonder: can’t the observation/action spaces be automatically retrieved by Ray after the tune variables are resolved? It would be a nice feature and it would also avoid passing observation/action spaces to the configuration (which is annoying since one needs to create the Gym objects before the environment is instantiated)

What I’m referring to:

 # === Settings for Multi-Agent Environments ===
    "multiagent": {
        # Map of type MultiAgentPolicyConfigDict from policy ids to tuples
        # of (policy_cls, obs_space, act_space, config). This defines the
        # observation and action spaces of the policies and any extra config.
        "policies": {},

see the full config here

1 Like

Hey @fedetask , great question. Configuring observation- and action spaces in a multi-agent setup is often tricky. Some multi-agent envs may even have different spaces for different agents.
However, if your environment exposes the observation_space and action_space properties (depending on these grid-searched other params you mentioned), then RLlib is able to automatically infer the spaces for the policies like so:

# from ray.rllib.policy.policy import PolicySpec

        p1: PolicySpec(config={.... , env_config: ...})
        p2: PolicySpec(config={..., env_config: ...})  # <- your config overrides here

If observation_space or action_space are missing from the PolicySpec object, then RLlib will try to infer these automativally.
An example:

@sven1977 @fedetask
In the case of pettingzoo envs, it seems impossible to use different spaces for different agents with the obs/action_space inference from the env:
Master pettingzoo env code:

        # Get first observation space, assuming all agents have equal space
        self.observation_space = self.par_env.observation_space(self.par_env.agents[0])

        # Get first action space, assuming all agents have equal space
        self.action_space = self.par_env.action_space(self.par_env.agents[0])

        assert all(
            self.par_env.observation_space(agent) == self.observation_space
            for agent in self.par_env.agents
        ), (
            "Observation spaces for all agents must be identical. Perhaps "
            "SuperSuit's pad_observations wrapper can help (useage: "

And also as of today, the MultiAgentDict action/obs spaces inference from MultiAgentEnv is not working as it should when dealing with different agents having differents spaces. That’s why petingzooEnv use a hacky way to retrieve the spaces from the first agent only and use it for all agents.
But is should be possible to define a MultiAgentDict of spaces, lot of work has already been done for it in the

Please have a look at my post, I just updated it :

I’ll keep an eye on this post too, I’m having a similar use case

Edit: This PR will make the multi agent env spaces inferences working, without providing spaces in the config policySpec. [RLlib] Discussion 6060 and 5120: auto-infer different agents' spaces in multi-agent env. by sven1977 · Pull Request #24649 · ray-project/ray · GitHub

@fedetask Could you try using

env_config: tune.sample_from(
        lambda spec: env1 if spec.config["myparameter"]==myvalue else env2

in your environment configuration for the agents?