Multi agent use same policy

Hello, If I want to set 9 agent.

I set as this:
“multiagent”: {
“policies”: {
‘0’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘1’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘2’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘3’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘4’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘5’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘6’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘7’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
‘8’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {})
},
“policy_mapping_fn”: policy_mapping # Traffic lights are always controlled by this policy
},
There is 9 agent works.

But, I want 9 agent use the same policy, if I set as this:
“multiagent”: {
“policies”: {
‘0’: (PPOTFPolicy, spaces.Box(low=np.zeros(20), high=np.array([‘inf’] * 20)), spaces.Discrete(4), {}),
},
“policy_mapping_fn”: policy_mapping # Traffic lights are always controlled by this policy
},

How many agent works?

Hi @zzchuman,

If your policy mapping function is like below then every agent in an episode will use the same instance of a single policy.

“policy_mapping_fn”: lambda _: '0'

Hello, mannyu! Thank you! You mean that if I set as this:
“policy_mapping_fn”: lambda _: ‘0’

Multi agent will use the same policy?

And each agent use the same NN or same NN parameter? I do not know.

Can you teach me?

@zzchuman

The policy keys define your polices, in rllib these are almost always neural networks. For every key in the policies dictionary it will create one policy which is usually a collection of neural networks, for example a network for the actions, a target network for the actions, and a network for the value function. Each key constructs one policy and every agent that uses a policy with the same key is using the same policy, which of course means they all use the same parameters, whether these are neural network parameters or other parameters.

Now for agents in an episode of an environment. Every time rllib encounters an agent with a name it has not seen before it will use the policy_mapping_fn to assign a policy to that agent. In the example above our mapping function means assign every agent in the environment, regardless of it name, to the exact same policy. In this case every agent will use the exact same policy, with exactly the same neural networks and parameters for that neural network.

In your first example let’s say you had an environment with 9 agents and you wanted each one to use a different policy. In this case you have policies with keys {‘0’…‘8’}.

If the name of your agents in the environment were {“agent_0”,“agent_1”,…,“agent_8”} you could write mapingg function like this.

“policy_mapping_fn”: lambda agent_id: agent_id[-1]

This would extract the last character of the agent ID and use that to choose the policy to use.

Thank you very much! I got it!

I have another question. If I set all agents use the same policy or NN, that is sharing policy? Right?

Yes that would be policy / parameter sharing. It would not be centralized training though since the policy is only ever seeing one agents observations at a time

Thank you! I got it, it woulb be policy or paramerter sharing. But, it is still decentralized control