Hello all, can you help me to understand more intuitively how the multi-agent training process is done? For example in the case of Multi-agent PPO, does the trainer minimize the loss function of all the agents at the same time? or try to minimize the summation of the losses?

I understand how other algorithms like MADDPG share the critic, but I couldn’t find the documentation of how multi-agent PPO works.