Workflow for Multi-Agent training

MRuiz · January 7, 2022, 3:31pm

Hello all, can you help me to understand more intuitively how the multi-agent training process is done? For example in the case of Multi-agent PPO, does the trainer minimize the loss function of all the agents at the same time? or try to minimize the summation of the losses?

I understand how other algorithms like MADDPG share the critic, but I couldn’t find the documentation of how multi-agent PPO works.

sven1977 · January 12, 2022, 9:24am

Hey @MRuiz , thanks for this question!

Yes, RLlib has basically two different multi-agent approaches:

Specialized MA algos, such as QMIX and MADDPG, which train a centralized critic model and output actions as a single (Tuple) action.
Independent MA learning (where each policy you define gets updated separately given its experience data from the environment), which happens for all the other Trainers (not QMIX/MADDPG) every time you specify the “multiagent” sub-config and provide one or more policies (with their classes, action/obs spaces, and config overrides), a agentID->policyID mapping fn, etc… In this case, yes, all the policies’ losses are minimized separately. There is no shared loss, network, or anything. Basically, each policy learns by itself and considers all the other agents part of the environment.

MRuiz · January 12, 2022, 2:42pm

Thank you very much @sven1977 ! for your reply.

Topic		Replies	Views
Multi reward optimization RLlib	6	407	September 29, 2021
Accessing other agents' rewards and actions in ppo loss for multi agent environment RLlib	0	147	January 12, 2024
Multi agent use same policy RLlib	7	701	June 26, 2021
Multi-Agent Training with Different Algorithms RLlib	24	3533	October 11, 2022
Asymmetric play multiagent environment RLlib	2	471	January 6, 2022

Workflow for Multi-Agent training

Related topics