Cross-agent Pooling in decentralized Multi-agent execuation

Hello folks, I wish to do something similar to the “cross-hero pool” mentioned in the OpenAI Dota Five paper. Here is a screenshot of a short description from the paper:

In simple words, I have five shared-parameter agents and I want to concat a team-wise feature, which is maxpooled on all agents’ FC outputs, to each individual FC output as a means to share information across the team.

My environment is a MultiAgentEnv, meaning that I do have access to all agents’ observations during each step call. Therefore, I can perhaps share the observations among agents by adding all agents’ observations to individual agents’ observation dict. However, this would be terribly inefficient because I will need to repeatedly perform model inferences on the same shared observations in every agent’s forward call.

Another way is perhaps centralized execution. But I wish to do it in a decentralized fashion because MultiAgentBatch is very handy to use. I like how it can handle early exiting agents automatically.

Any idea on how to better do this? Thanks in advance.

Hi @mannyv, I believe that you are very experienced with using RLlib, I was wondering if you would like to share some insights, please?