GroupAgentsWrapper is interfering with metric tracking

So, I’m using QMIX and part of setting up qmix is using the environment created by MultiAgentEnv.with_grouped_envs(group, act_space, obs_space) where act_space and obs_space are Tuple-Spaces.

Two things, when adapting from the two_step_game.py example file, we setup the group structure as group_id = {'agents': ['1', '2', '3', '4']} and then set self._agent_ids = {"agents"}.

This means that in the policy_mapping_fn, the agent_id that gets passed in is agents rather than an ID like e.g., 1, 2, 3, … .

    def policy_mapping_fn(agent_id, episode, **kwargs):
        if agent_id == 0:
            return '0'
        elif agent_id == 1:
            return '1'
        elif agent_id == 2:
            return '2'
        else:
            return '3'

This method usually works when we don’t have the grouped agents and all the data I want for each agent is tracked properly.

I am currently using a custom model, so in the config, I need to set it up as:

config["multiagent"] = {
        'policies': {
            "0": (None,
                  env.observation_space,
                  env.action_space,
                  {'model':{'custom_model': 'SimpleConv',
                   'custom_model_config': {}}
                   }),
            "1": (None,
                  env.observation_space,
                  env.action_space,
                  {'model':{'custom_model': 'SimpleConv',
                   'custom_model_config': {}}
                   }),
            "2": (None,
                  env.observation_space,
                  env.action_space,
                  {'model':{'custom_model': 'SimpleConv',
                   'custom_model_config': {}}
                   }),
            "3": (None,
                  env.observation_space,
                  env.action_space,
                  {'model':{'custom_model': 'SimpleConv',
                   'custom_model_config': {}}
                   }),
        },
        "policy_mapping_fn": policy_mapping_fn,
    }

However, now in the mapping, we are getting agents passed in to the mapping function and not the actual policy IDs therefore, the metrics are not being tracked for all of the agents.

What is the appropriate way of setting up the policy mapping function while using the grouped environment setting like is required for QMIX?

I am currently trying out breaking apart the groupings with something like:
g1 = {f'group{k - 1}': [k - 1] for k in env.get_agent_ids()} but that results in the following error:

...
File "C:\Users\Roque\AppData\Roaming\Python\Python38\site-packages\ray\rllib\execution\rollout_ops.py", line 99, in synchronous_parallel_sample
    sample_batches = ray.get(
  File "C:\Users\Roque\AppData\Roaming\Python\Python38\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Roque\AppData\Roaming\Python\Python38\site-packages\ray\worker.py", line 1831, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): e[36mray::RolloutWorker.sample()e[39m (pid=39368, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002111056DAC0>)
ValueError: The two structures don't have the same nested structure.
...
Entire first structure:
[.]
Entire second structure:
(., ., ., .)

Any advice on how to use QMIX with custom models and how to properly setup the policy mapping function would be greatly appreciated.