Policy mapping and agentIDs in hierachical env example

Blubberblub · August 13, 2021, 9:53am

Hey everyone,

i’m writing a custom multi-agent environment that should implement multiple hierarchical agents.

So i looked into the hierarchical_training.py example that uses the HierarchicalWindyMazeEnv. However i’m getting confused how to exactly implement the correct agents IDs in the environment code.

In the policy mapping function from hierarchical_training.py the policies are mapped by agent ids. But in the HierarchicalWindyMazeEnv the agent ids are implemented in two ways:

as key for the observation return in the reset function in case of the high_level_agent
and
with self.low_level_agent_id for the low_level_agent.

Opposed to this the FlexAgentsMultiAgent example class from the multi_agent.py example sets the agent ids while initializing the agents to an self.agents dict (using int values as opposed to strings).

What’s the best practice here and how exactly does tune or the trainer connects the policies to the agent ids while running?

File references:

github.com

ray-project/ray/blob/master/rllib/examples/hierarchical_training.py

"""Example of hierarchical training using the multi-agent API.

The example env is that of a "windy maze". The agent observes the current wind
direction and can either choose to stand still, or move in that direction.

You can try out the env directly with:

    $ python hierarchical_training.py --flat

A simple hierarchical formulation involves a high-level agent that issues goals
(i.e., go north / south / east / west), and a low-level agent that executes
these goals over a number of time-steps. This can be implemented as a
multi-agent environment with a top-level agent and low-level agents spawned
for each higher-level action. The lower level agent is rewarded for moving
in the right direction.

You can try this formulation with:

    $ python hierarchical_training.py  # gets ~100 rew after ~100k timesteps

This file has been truncated. show original

github.com

ray-project/ray/blob/master/rllib/examples/env/windy_maze_env.py

import gym
from gym.spaces import Box, Discrete, Tuple
import logging
import random

from ray.rllib.env import MultiAgentEnv

logger = logging.getLogger(__name__)

# Agent has to traverse the maze from the starting position S -> F
# Observation space [x_pos, y_pos, wind_direction]
# Action space: stay still OR move in current wind direction
MAP_DATA = """
#########
#S      #
####### #
      # #
      # #
####### #
#F      #

This file has been truncated. show original

github.com

ray-project/ray/blob/master/rllib/examples/env/multi_agent.py

import gym
import random

from ray.rllib.env.multi_agent_env import MultiAgentEnv, make_multi_agent
from ray.rllib.examples.env.mock_env import MockEnv, MockEnv2
from ray.rllib.examples.env.stateless_cartpole import StatelessCartPole
from ray.rllib.utils.annotations import Deprecated


@Deprecated(
    old="ray.rllib.examples.env.multi_agent.make_multiagent",
    new="ray.rllib.env.multi_agent_env.make_multi_agent",
    error=False)
def make_multiagent(env_name_or_creator):
    return make_multi_agent(env_name_or_creator)


class BasicMultiAgent(MultiAgentEnv):
    """Env of N independent agents, each of which exits after 25 steps."""

This file has been truncated. show original

Topic		Replies	Views
Get agent ID in multi-agent setting RLlib	16	1690	October 5, 2021
I'm confused about how policy mapping works in configuration RLlib	5	2519	July 29, 2022
Multi algorithms in hieralchical training Ray Tune	4	524	April 15, 2021
Policy mapping for computing actions in multi agent env RLlib	8	1229	January 2, 2022
Multiple hierarchical agents possible? RLlib	2	582	August 11, 2021

Policy mapping and agentIDs in hierachical env example

Related topics