Adding virtual agents in MARL

Hi, I’m working on a MARL env where virtual agents are added in train time randomly. More specifically, I have agents A1, A2, … The agents all have their own unshared models. How should I approach it If I plan to add virtual1_A1, which behaves independently from A1 but uses the same model as A1? This is kind of tricky since they use the same policy, but we need to make sure they only see their own hidden states.

Here’s my idea: Since I don’t need to enumerate which agents will be in the environment, I can just specify in my policy_mapping_fn that all agents whose IDs end with A1 use the same policy. This should make sure virtual1_A1 doesn’t share hidden states with A1. My concern is, this will probably cause the replay buffer collected from A1 and virtual_A1 to update their policy sequentially, since they are considered to be different agents.

Should I worry about this sequential-ness of the update? Is there a way to merge the buffer at learning time, or bypass the splitting altogether?

Hi @Aceticia,

Your idea is good. Any number of agents can share the same policy. Each will use the policy independently during execution (sampling rollouts).

During training if you do not add any centralizing peices like for example the centralized critic, or an algorithm like qmix or maddpg then each transition is considered separately for each agent.

During the actual loss calculations, the losses are computed in separate batches based on policy not agent. So if you have 3 agents that all map to the same policy, the transition states for each of those 3 agents will be combined in the loss calculation. Again keep in mind that if it is not a multiagent algorithm then the loss on each time step for each agent is considered independently and they are all averaged at the end.

Policies are updated sequentially in a loop one at a time.

1 Like