How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I’m opening this topic to notify an issue that appears when using tune’s hyperparameter tuning with a multi-agent configuration.
It can be the case that one wants to perform a grid search on some parameters that determine the observation/action space of the environment. It would be very helpful to use tune.grid_search()
to automatically start several training for different observation spaces.
However, if we’re in a multi-agent setting, we need to specify the policies
key in the multi-agent
part of the configuration. This requires specifying beforehand the observation and action space of agents in the environment, thus making it impossible to perform a grid search on them.
This makes me wonder: can’t the observation/action spaces be automatically retrieved by Ray after the tune variables are resolved? It would be a nice feature and it would also avoid passing observation/action spaces to the configuration (which is annoying since one needs to create the Gym objects before the environment is instantiated)
What I’m referring to:
# === Settings for Multi-Agent Environments ===
"multiagent": {
# Map of type MultiAgentPolicyConfigDict from policy ids to tuples
# of (policy_cls, obs_space, act_space, config). This defines the
# observation and action spaces of the policies and any extra config.
"policies": {},
see the full config here
1 Like
Hey @fedetask , great question. Configuring observation- and action spaces in a multi-agent setup is often tricky. Some multi-agent envs may even have different spaces for different agents.
However, if your environment exposes the observation_space
and action_space
properties (depending on these grid-searched other params you mentioned), then RLlib is able to automatically infer the spaces for the policies like so:
# from ray.rllib.policy.policy import PolicySpec
multiagent:
policies:
p1: PolicySpec(config={.... , env_config: ...})
p2: PolicySpec(config={..., env_config: ...}) # <- your config overrides here
If observation_space or action_space are missing from the PolicySpec object, then RLlib will try to infer these automativally.
An example: ray.rllib.examples.rock_paper_scissors_multiagent.py
@sven1977 @fedetask
In the case of pettingzoo envs, it seems impossible to use different spaces for different agents with the obs/action_space inference from the env:
Master pettingzoo env code: https://github.com/ray-project/ray/blob/master/rllib/env/wrappers/pettingzoo_env.py
# Get first observation space, assuming all agents have equal space
self.observation_space = self.par_env.observation_space(self.par_env.agents[0])
# Get first action space, assuming all agents have equal space
self.action_space = self.par_env.action_space(self.par_env.agents[0])
assert all(
self.par_env.observation_space(agent) == self.observation_space
for agent in self.par_env.agents
), (
"Observation spaces for all agents must be identical. Perhaps "
"SuperSuit's pad_observations wrapper can help (useage: "
"`supersuit.aec_wrappers.pad_observations(env)`"
)
And also as of today, the MultiAgentDict action/obs spaces inference from MultiAgentEnv is not working as it should when dealing with different agents having differents spaces. That’s why petingzooEnv use a hacky way to retrieve the spaces from the first agent only and use it for all agents.
But is should be possible to define a MultiAgentDict of spaces, lot of work has already been done for it in the MultiAgentEnv.py
Please have a look at my post, I just updated it : https://discuss.ray.io/t/multiagents-type-actions-observation-space-defined-in-environement/5120/5
I’ll keep an eye on this post too, I’m having a similar use case
Edit: This PR will make the multi agent env spaces inferences working, without providing spaces in the config policySpec. [RLlib] Discussion 6060 and 5120: auto-infer different agents' spaces in multi-agent env. by sven1977 · Pull Request #24649 · ray-project/ray · GitHub
@fedetask Could you try using
env_config: tune.sample_from(
lambda spec: env1 if spec.config["myparameter"]==myvalue else env2
)
in your environment configuration for the agents?