Hello folks, I wish to do something similar to the “cross-hero pool” mentioned in the OpenAI Dota Five paper. Here is a screenshot of a short description from the paper:
In simple words, I have five shared-parameter agents and I want to concat a team-wise feature, which is maxpooled on all agents’ FC outputs, to each individual FC output as a means to share information across the team.
My environment is a
MultiAgentEnv, meaning that I do have access to all agents’ observations during each step call. Therefore, I can perhaps share the observations among agents by adding all agents’ observations to individual agents’ observation dict. However, this would be terribly inefficient because I will need to repeatedly perform model inferences on the same shared observations in every agent’s forward call.
Another way is perhaps centralized execution. But I wish to do it in a decentralized fashion because MultiAgentBatch is very handy to use. I like how it can handle early exiting agents automatically.
Any idea on how to better do this? Thanks in advance.