RLlib Multi-Agent/ReplayBuffer DQN/SAC Error: Agents with Different Observation Space Shapes

Hi everyone!
I hope you’re all having a great day. I’m currently working on a project using RLlib for Multi-Agent RL and I’ve encountered a problem that I could use some help with. I’ll try to provide as much detail as possible, so please bear with me!

I’m working with multiple agents, each having a different observation space. Everything seems to be working fine when I use on-policy algorithms like PPO and A3C. However, when I switch to off-policy algorithms like SAC and DQN, I’m getting an error that I’m not sure how to fix. The error message is:

"AssertionError: built_steps (1) + ongoing_steps (1) != rollout_fragment_length (1)."

To give you more context, here’s the config I’m using to run my experiment:

algorithm = DQNConfig()
   batch_size = 1024
   config = (
           algorithm
           .environment(myMultiAgentEnv)
           .framework("torch")
           .rollouts(num_rollout_workers=8)
           .training(gamma=0.99, lr=0.00005,dueling=True,double_q=True,before_learn_on_batch=True,
                   replay_buffer_config={'_enable_replay_buffer_api': True, 'type': 'MultiAgentReplayBuffer', 'capacity': 50000, 'replay_sequence_length': 1,})
           .multi_agent(
               policies={
                   "agent_one": (
                       None,
                       one_env.observation_space,
                       one_env.action_space,
                       {}
                   ),
                   "agent_two": (
                       None,
                       two_env.observation_space,
                       two_env.action_space,
                       {}
                   ),
                   "agent_three": (
                       None,
                       three_env.observation_space,
                       three_env.action_space,
                       {}
                   ),
               },
               policy_mapping_fn=policy_mapping_fn,
           )
           .callbacks(CustomCallbacks)
           .resources(num_cpus_per_worker=2,
                      num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
       )

As you can see, I’ve set up three agents with their own observation and action spaces in the multi_agent configuration.

My main question is: What does this error message mean, and how can I resolve it?
I’m also curious whether the replay buffer in RLlib can handle multiple agents with different observation space shapes.

I would really appreciate any insights or suggestions you can offer.
Your help would mean a lot to me, and I’m eager to learn from your experiences.

Thank you so much for your time and consideration! :blush:

Thank you for raising this! RLlib ReplayBuffers should work with a multi-agent environment with different observation spaces. Could you provide a repro script with your environment and policy_mapping_fn?

I tested ray/multi_agent_different_spaces_for_agents.py at master · ray-project/ray · GitHub with PPO, DQN, and SAC, and could not reproduce the issue.

Hello, did you have a solution for this problem? I experience the same!