Policy mapping and agentIDs in hierachical env example

Hey everyone,

i’m writing a custom multi-agent environment that should implement multiple hierarchical agents.

So i looked into the hierarchical_training.py example that uses the HierarchicalWindyMazeEnv. However i’m getting confused how to exactly implement the correct agents IDs in the environment code.

In the policy mapping function from hierarchical_training.py the policies are mapped by agent ids. But in the HierarchicalWindyMazeEnv the agent ids are implemented in two ways:

  1. as key for the observation return in the reset function in case of the high_level_agent
    and
  2. with self.low_level_agent_id for the low_level_agent.

Opposed to this the FlexAgentsMultiAgent example class from the multi_agent.py example sets the agent ids while initializing the agents to an self.agents dict (using int values as opposed to strings).

What’s the best practice here and how exactly does tune or the trainer connects the policies to the agent ids while running?

File references: