I'm confused about how policy mapping works in configuration

Hi @Mehdi,

The names of the agents are defined in the environment you provide and are included as keys in the data provided by reset and step.

In RLLIB algorithms there are policies that make the action decisions given observation from the environment. These algorithms are optimized with an RL algorithm during training.

In the RLLIB config you need to define the policies you want to use to make action decisions. If you don’t specify any a single policy called “default_policy” will be created.

You also need to create a policy mapping function that maps agent ids to policy ids. Unless you are using the default_policy in which case you do not need to provide this mapping because they are all mapped to one policy.

Now here is the part I think you are confused by. There is no formal specification of the agent_ids provided during configuration. That is implicit information in the environment that you need to know ahead of time or write some methods in your environment to retrieve them. The member _agent_ids is an attempt to remedy that implicit knowledge but it is an RLLIB convention and most environments do not have that.

You do not necessarily need to know the exact agent names ahead of time if they are named according to some convention. For example perhaps you have an environment that has car agents (whose names are formatted like car_0, car_1, car_2,… ) and bicycles (bike_0, bike_1, …) and you have two policies one for cars (car_policy) and one for bicycles (bike_policy). You could write a policy mapping function like this:

def agent_to_policy_map(agent_id):
    if agent_id.startwith("car"):
        return "car"
    elif agent_id.startswith("bike"):
        return "bike"
    else:
        raise ValueError("Unknown agent type: ", agent_id)
2 Likes