Why are policies randomly assigned to agent_ids in the mapping function?

kia · April 24, 2021, 8:37am

In the documentation, it infers that a single policy is mapped to a particular agent for the course of training. However, in the multi-agent examples, the policy mapping. function utilises a random choice to assign policies to an agent. Why is this? Have I misunderstood the docs ? Thank you!

mannyv · April 24, 2021, 12:41pm

Hi @kia,

You have the concept correct. Each agent is mapped to a policy. Any time rllib sees a new agent returned from reset or step it will use the policy mapping function to determine which policy to map it to. When it is a single agent setup rllib automatically creates a policy called “default_policy” and maps all agents to it.

The examples are just showing how to create the multiple policies in the multiagent dictionary and a mapping function.Whoever wrote it decided that they would just assign agents randomly. You can do somthing else in your function that is more appropriate for your environment.

https://docs.ray.io/en/master/rllib-env.html#multi-agent-and-hierarchical

In the example above they create three policies. One for traffic lights and two for cars. All the traffic lights in the environment always use the same policy but the cars are randomly assigned to one of the two policies. Even thought the initial assignment is random, once an agent is assigned to a policy it will always use that same policy during that training session.

kia · April 24, 2021, 1:01pm

That cleared it up, thanks!

Topic		Replies	Views
Two different method mapping policy to agents RLlib	1	289	February 2, 2023
Get agent ID in multi-agent setting RLlib	16	1687	October 5, 2021
How to train multiple policies in one environment? RLlib	3	408	January 12, 2023
Multi agent use same policy RLlib	7	694	June 26, 2021
Policy mapping for computing actions in multi agent env RLlib	8	1220	January 2, 2022

Why are policies randomly assigned to agent_ids in the mapping function?

Related topics