Multi agent use same policy

zzchuman · June 26, 2021, 3:29am

Hello, If I want to set 9 agent.

I set as this:
“multiagent”: {
“policies”: {
‘0’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘1’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘2’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘3’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘4’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘5’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘6’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘7’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘8’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {})
},
“policy_mapping_fn”: policy_mapping # Traffic lights are always controlled by this policy
},
There is 9 agent works.

But, I want 9 agent use the same policy, if I set as this:
“multiagent”: {
“policies”: {
‘0’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
},
“policy_mapping_fn”: policy_mapping # Traffic lights are always controlled by this policy
},

How many agent works?

mannyv · June 26, 2021, 10:57am

Hi @zzchuman,

If your policy mapping function is like below then every agent in an episode will use the same instance of a single policy.

“policy_mapping_fn”: lambda _: '0'

zzchuman · June 26, 2021, 11:41am

Hello, mannyu! Thank you! You mean that if I set as this:
“policy_mapping_fn”: lambda _: ‘0’

Multi agent will use the same policy?

And each agent use the same NN or same NN parameter? I do not know.

Can you teach me?

mannyv · June 26, 2021, 12:19pm

@zzchuman

The policy keys define your polices, in rllib these are almost always neural networks. For every key in the policies dictionary it will create one policy which is usually a collection of neural networks, for example a network for the actions, a target network for the actions, and a network for the value function. Each key constructs one policy and every agent that uses a policy with the same key is using the same policy, which of course means they all use the same parameters, whether these are neural network parameters or other parameters.

Now for agents in an episode of an environment. Every time rllib encounters an agent with a name it has not seen before it will use the policy_mapping_fn to assign a policy to that agent. In the example above our mapping function means assign every agent in the environment, regardless of it name, to the exact same policy. In this case every agent will use the exact same policy, with exactly the same neural networks and parameters for that neural network.

In your first example let’s say you had an environment with 9 agents and you wanted each one to use a different policy. In this case you have policies with keys {‘0’…‘8’}.

If the name of your agents in the environment were {“agent_0”,“agent_1”,…,“agent_8”} you could write mapingg function like this.

“policy_mapping_fn”: lambda agent_id: agent_id[-1]

This would extract the last character of the agent ID and use that to choose the policy to use.

zzchuman · June 26, 2021, 12:44pm

Thank you very much! I got it!

zzchuman · June 26, 2021, 3:17pm

I have another question. If I set all agents use the same policy or NN, that is sharing policy? Right?

mannyv · June 26, 2021, 6:13pm

Yes that would be policy / parameter sharing. It would not be centralized training though since the policy is only ever seeing one agents observations at a time

zzchuman · June 26, 2021, 10:35pm

Thank you! I got it, it woulb be policy or paramerter sharing. But, it is still decentralized control

Topic		Replies	Views
Question about multi agent linked to the same policy RLlib	1	469	October 7, 2021
An example of RLLib used with multiple neural networks RLlib	2	367	June 29, 2022
Two different method mapping policy to agents RLlib	1	289	February 2, 2023
Multi agent partial parameter sharing RLlib	2	413	November 30, 2023
Why are policies randomly assigned to agent_ids in the mapping function? RLlib	2	324	April 24, 2021

Multi agent use same policy

Related topics