Desperately need help implementing multi-agent algorithm with shared replay buffer

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello,

I’m working on implementing a custom Multi-Agent Reinforcement Learning (MARL) algorithm for my master-thesis that involves manipulating the replay memory/buffers of individual agents. Specifically, I have two independent DQN agents. After each episode, I want to review the episode, filter out “special” moments, and selectively store some of these moments in the replay buffer of the first agent and others in the replay buffer of the second agent.

Could anyone provide guidance or examples on how to achieve this? I could not find any examples of a custom algorithm that manages two agents, and simply using a custom replay buffer seems not to be enough, as I need to have accesses to both of them at runtime.

Any help is appreciated as I do not have any experience with custom algorithms, policies and replay buffers.