PolicyClient and QMix + MultiAgentEnv?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have an RL experiment that uses the QMix algorithm on a MultiAgentEnv. However, it is pretty slow due to some compute-intensive sub-processes so I’m looking to use the Client-Server architecture described here (Environments — Ray 3.0.0.dev0) to improve episodes/min.

Building on the simple Cartpole scripts example of client-server architecture, I’ve extended it to work with a toy MultiAgentEnv environment.

Here are the client-side parameters:

    grouping = {
        "group_1": [0],
    }
    obs_space = Tuple(
        [
            Box(float("-inf"), float("inf"), (4,))
    ]
    )
    act_space = Tuple(
        [
            Discrete(2)
        ]
    )
    env = TestEnv().with_agent_groups(grouping, obs_space=obs_space, act_space=act_space)

So, it’s a simple single agent, single group MultiAgentEnv.

The server-side parameters are the following:

    obs_space = Tuple(
        [
            Box(float("-inf"), float("inf"), (4,))
    ]
    )
    act_space = Tuple(
        [
            Discrete(2)
        ]
    )

    config = (
        ...
        .environment(
            env=None,
            observation_space=obs_space,
            action_space=act_space,
        )
       ...
)

When I run this, I get the following error:

ValueError: The two structures don't have the same nested structure.

First structure: type=tuple str=({'group_1': [array([ 0.03581125, -0.00619089,  0.04431266,  0.03354166], dtype=float32)]},)

Second structure: type=tuple str=(array([ 0.8475069 ,  0.6639858 , -0.13116916,  1.0029598 ], dtype=float32),)

More specifically: Substructure "type=dict str={'group_1': [array([ 0.03581125, -0.00619089,  0.04431266,  0.03354166], dtype=float32)]}" is a sequence,
while substructure "type=ndarray str=[ 0.8475069   0.6639858  -0.13116916  1.0029598 ]" is not
Entire first structure:
({'group_1': [.]},)
Entire second structure:
(.,)

I know there is an ExternalMultiAgentEnv but I’d like to use PolicyClient at the moment. Does PolicyClient not work with MultiAgentEnv?

I’d appreciate any help. Thanks in advance.

Cheers!

I could figure it out. I’ll put it down here for future reference for others.

  1. Your env class must inherit ExternalMultiAgentEnv & MultiAgentEnv.
  2. You need to add the following to your QMixConfig
    multiagent_config = {
    "policies": {
        "main": (None, obs_space, act_space, {})
    },
    "policy_mapping_fn": lambda agent_id: "main"
    }

config(
        ...
        .multi_agent(**multiagent_config)
        ...
)

where the lambda function controls the policy used by each agent. For shared policy, the above works.

Hope this helps someone!