MultiAgents type actions/observation space defined in environement

Hi,

The context:
As i’m running my simulation within a game engine, it’s necessary to retrieve the action and observation space from the game at initialization.

In the case of single agent type, it’s easy: Connect to the game, ask for information, and define env.observation_space and env.action_space with it. If the policies doesn’t have any space defined, the one in the env will be used

My issue:
In the case of multi agent type: I’m now in trouble, I can’t define it within the environement because of the multiple actions/obs spaces, so I have to define them within the policiesSpec
But, I need to define the policies before starting the game which is not possible in my case since i’m getting them from the game

My ideas:
I think there would be a hacky way to make it work: giving a callback to each env, that will be called after initialization and be used to create the policies if there are none yet in the trainer.
But when using Tune, I don’t have any reference to the trainer until the first on_train_result callback, so not possible

Another way would be to run a fake environement (so basically launch the game once before starting the training), and retrieve essential data before starting the training. This might lead to annoying issues with remote executions in the future

I saw some part of API for the agent_id → agent_space mapping
Here it use a Dict space to map agent_id to sub space: ray/test_nested_observation_spaces.py at 740def0a131a152a9408b22eaede28c62a848e3b · ray-project/ray · GitHub
Here there is a mention about multi agent spaces mapping: ray/multi_agent_env.py at 740def0a131a152a9408b22eaede28c62a848e3b · ray-project/ray · GitHub and : _check_if_space_maps_agent_id_to_sub_space() and _spaces_in_preferred_format seems to be useful for my case, but cannot find any working examples of it and most of it is marked as ExperimentalAPI

Does anyone sees a “cleaner” way to do it ?
Thanks

Hey @TheTrope , could you try using dynamic policy creation via the Trainer.add_policy() method in connection with e.g. a custom callback, similar to how we do this in this self-play example script?

Sure you would probably have to still provide one dummy policy in your multiagent.policies dict, but don’t really need to ever use it and its spaces wouldn’t matter.

Let us know, if you need help with this.

Oh, the callback you should try for creating the new policies via Trainer.add_policy(), is probably: on_sub_environment_created.

Btw, could you post a simple, self-sufficient example script that describes your env setup and your problem? Doesn’t have to be the real env, just a shim env that has multiple obs/action spaces for the different agents and the Trainer you are trying to build. Would love to have this as an example in RLlib.

Hi @TheTrope,

I have am environment where I do not know the spaces before hand. It changes depending on configuration.

I do what you mentioned below. I create an environemnt and save the spaces then I call env.close(). I have not had any issues with this method. What do you mean by “annoying issues with remote executions in the future”?

Hi @sven1977

Unfortunately, the on_sub_environment_created(self,*,worker: "RolloutWorker",sub_environment: EnvType,env_context: EnvContext,**kwargs, ) -> None:
doesn’t give me a reference to the trainer, but I think I could make it work by combining it with the on_trainer_init callback
Howerver, thoses are not available in the 1.10 or the 1.11 preview on pip. I could install ray from master but i’d like to stay on official releases.

Here is an example of what I’d like to achieve:

I took the same idea as the MultiAgentTrafficEnv from the docs

However, in the docs, the obs and action spaces are defined in the multiagent policies configs, in my case, I want to define them in the environement. (Because I get them from the simulation env)

In Ray 1.10, only a single and obs space could be defined in the env, making it impossible for multiple agent types to be defined within the env, and policies config beforehand was required

In ray 1.11 preview, it should be possible to define a dict, mapping per-agent spaces in this way:

self.observation_spaces = spaces.Dict({
    "car_0":  CAR_OBS_SPACE,
    "car_1":  CAR_OBS_SPACE,   
    "traffic_light_0": TRAFFIC_LIGHT_OBS_SPACE,
     ...
)}

This functionality has been implemented in the following commit: [RLlib] [MultiAgentEnv Refactor #2] Change space types for `BaseEnvs`… · ray-project/ray@39f8072 · GitHub
Unfortunately it is not working as expected

Hello @sven1977 ,

I’ve been trying again, and the action and observation spaces map from agent ids to spaces for the individual agents is still not working in 1.12.

It leads to an error: ValueError: The two structures don’t have the same nested structure.
Then it shows the two structures, the first one being a sampled obs space of one of the agents, the second structrure being a multiagent dict

Is there any new updates on it ? The gist I uploaded in my previous post still show the error, and here is a smaller version able to reproduce the issue: https://gist.github.com/TheTrope/c0d8efafa3853caeb5a083a25af5bffe

1 Like

Hey @TheTrope , you are right, the on_sub_environment_created won’t give you the Trainer object b/c this callback is done on the (@ray.remote) RolloutWorkers, which carry the envs and sub-envs and are Ray Actors that have no access to the Trainer objects (different processes).

I looked at the above gist and can’t seem to make sense of it though. Here is my problem:

  • Your env has different observation spaces for agent0 and for agent1.
  • Your policy_mapping_fn maps all agents (agent0 and agent1) to the “main” policy.
  • Thus, both agents use the same policy instance.
  • However, any policy can only be constructed using a single, well defined observation space, whereas in your case, agent0 has a different space than agent1.

The following code is actually working, but - of course - we have to specify the spaces for the different policies here. What we could indeed do is to try and automate the space-inference even if different agents have different spaces. We could do this via a reverse mapping from agent IDs to policyIDs and then assign the agents’ spaces given by the env to the different policy IDs (automaticlly). This is currently not supported.

Working code:

import ray
from ray import tune
from ray.rllib.env.multi_agent_env import MultiAgentEnv
import gym
from ray.rllib.policy.policy import PolicySpec
from ray.rllib.examples.policy.random_policy import RandomPolicy


class BasicMultiAgentMultiSpaces(MultiAgentEnv):
    def __init__(self):
        self.agents = {"agent0", "agent1"}
        self.dones = set()
        # Here i'm replacing the env spaces ie: self.observation_space = gym.spaces.Box(.....) to a multiAgentDict space
        self.observation_space = gym.spaces.Dict(
            {"agent0": gym.spaces.Box(low="-1", high=1, shape=(10,)),
             "agent1": gym.spaces.Box(low="-1", high=1, shape=(20,))})
        self.action_space = gym.spaces.Dict(
            {"agent0": gym.spaces.Discrete(2), "agent1": gym.spaces.Discrete(3)})
        self._agent_ids = set(self.agents)

        self._spaces_in_preferred_format = True
        super().__init__()

    def reset(self):
        self.dones = set()
        return {i: self.observation_space[i].sample() for i in self.agents}

    def step(self, action_dict):
        obs, rew, done, info = {}, {}, {}, {}
        for i, action in action_dict.items():
            obs[i], rew[i], done[i], info[i] = self.observation_space[
                                                   i].sample(), 0.0, False, {}
            if done[i]:
                self.dones.add(i)
        done["__all__"] = len(self.dones) == len(self.agents)
        print("step")
        return obs, rew, done, info


def main():
    tune.register_env(
        "ExampleEnv",
        lambda c: BasicMultiAgentMultiSpaces()
    )

    def policy_mapping_fn(agent_id, episode, worker, **kwargs):
        # agent0 -> main0
        # agent1 -> main1
        return f"main{agent_id[-1]}"

    ray.init(local_mode=True)
    tune.run(
        "PPO",
        stop={"episode_reward_mean": 200},
        config={
            "env": "ExampleEnv",
            "num_gpus": 0,
            "num_workers": 1,
            "multiagent": {
                "policies": {
                    "main0": PolicySpec(observation_space=gym.spaces.Box(low="-1", high=1, shape=(10,)), action_space=gym.spaces.Discrete(2)),
                    "main1": PolicySpec(observation_space=gym.spaces.Box(low="-1", high=1, shape=(20,)), action_space=gym.spaces.Discrete(3)),
                    "random": PolicySpec(policy_class=RandomPolicy),
                },
                "policy_mapping_fn": policy_mapping_fn,
                "policies_to_train": ["main0"]
            },
            "framework": "torch"
        }
    )


if __name__ == "__main__":
    main()

Thanks for answering @sven1977
Sorry for the confusion, you are right, my example was wrong, indeed I forgot to apply them different policies.
Edit: After feedback: The error I had comes from not providing the obs/action spaces in the config policy specs. The point of this topic is to not provide them, and instead infer them from the envs: This PR will fix it: [RLlib] Discussion 6060 and 5120: auto-infer different agents' spaces in multi-agent env. by sven1977 · Pull Request #24649 · ray-project/ray · GitHub

There is still the issue in my project, and in the current state, all the agents have the same spaces and they are mapped to the same policy

However, it should be working when all the agent have the same spaces, right ? Example:

class BasicMultiAgentMultiSpaces(MultiAgentEnv):
    OBS_SPACE = gym.spaces.Box(low="-1", high=1, shape=(10,))
    ACT_SPACE = gym.spaces.Discrete(2)
    def __init__(self):
        self.agents = {"agent0", "agent1"}
        self.dones = set()
        # Here i'm replacing the env spaces ie: self.observation_space = gym.spaces.Box(.....) to a multiAgentDict space
        self.observation_space = gym.spaces.Dict({"agent0": BasicMultiAgentMultiSpaces.OBS_SPACE, "agent1": BasicMultiAgentMultiSpaces.OBS_SPACE})
        self.action_space = gym.spaces.Dict({"agent0": BasicMultiAgentMultiSpaces.ACT_SPACE, "agent1": BasicMultiAgentMultiSpaces.ACT_SPACE})
        self._agent_ids = set(self.agents)

        self._spaces_in_preferred_format = True
        super().__init__()

    def reset(self):
        self.dones = set()
        return {i: self.observation_space[i].sample() for i in self.agents}

    def step(self, action_dict):
        obs, rew, done, info = {}, {}, {}, {}
        for i, action in action_dict.items():
            obs[i], rew[i], done[i], info[i] = self.observation_space[i].sample(), 0.0, False, {}
            if done[i]:
                self.dones.add(i)
        done["__all__"] = len(self.dones) == len(self.agents)
        print("step")
        return obs, rew, done, info


def main():

    tune.register_env(
        "ExampleEnv",
        lambda c: BasicMultiAgentMultiSpaces()
    )
    def policy_mapping_fn(agent_id, episode, worker, **kwargs):
        return "main"

    ray.init()
    tune.run(
        "PPO",
        stop={"episode_reward_mean": 200},
        config={
            "env": "ExampleEnv",
            "num_gpus": 0,
            "num_workers": 1,
            "multiagent" :{
                "policies": {
                    "main": PolicySpec(),
                    "random": PolicySpec(policy_class=RandomPolicy),
                },
                "policy_mapping_fn": policy_mapping_fn,
                "policies_to_train": ["main"]
            },
            "framework": "torch"
        }
    )

if __name__ == "__main__":
    main()

Here the two agents have the same space, and are mapped to the “main” policy
This return the same error as before