Multi-Agent Transformer

Status: None. Just asking a question out of curiosity.
The recent paper, Multi-Agent Reinforcement Learning is A Sequence Modeling Problem, introduced a novel Multi-Agent Transformer (MAT) algorithm for MARL. MAT is SOTA on several MARL benchmarks and seems well-fitted for distributed training. My questions are as follows:

  1. Do the prolific RLLIB authors have plans to implement this algorithm (or similar ones), as they did with QMIX?
  2. If not, how would one design a custom agent for this task? Can RAY Core/Clusters work with such transformer models?
  3. Should I care about this implementation?

I hope this finds you all well. Best,


Hi @Aidan_McLaughlin ,

We have a DT implementation that was merged into master 28 days ago.
Other than that, afaik, there are no plans on our roadmap for the near future of implementing this paper. Skimming the paper, Ray Clusters should certainly be able to handle such a workload. Models with such large numbers of parameters are only limited by the nodes’ memory of your cluster.

And lastly: Yes, please care about an implementation. We welcome community contributions very much. Even if we have no plan of implementing this ourselves. If well executed and aligned with our APIs, we will certainly merge it. Depending on how much maintenance it needs, you will have to help us maintain it though. If you are interested, please open a PR early so we can give feedback from time to time.


1 Like

Hey Arturn, thanks for the thoughtful response. Will the DT library work with MultiAgentEnv? Best,

Hi @arturn and @Aidan_McLaughlin,

There is a big conceptual difference between the DT and and MAT.

DT uses a transformer operates over consecutive timesteps in a trajectory.

MAT uses a transformer to operate over auto regressive agent actions within a time step. T
AFAIK the transformer is not used across timesteps.


@mannyv Thanks for the concise explanation!

@Aidan_McLaughlin have you decided now whether you want to implement the paper yet? :slight_smile:
MultiAgentEnv does not care for the kind of policy you use to produce actions.

Thanks Arturn, my team and I are certainly considering it. We’ll keep you updated. Thanks again for the help. Best,