DQN MultiAgentReplayBuffer not working

To use the old API stack, set .api_stack(enable_rl_module_and_learner=False, enable_env_runner_and_connector_v2=False) in your config. Also, for DQN with multi-agent and replay buffer, use "type": "MultiAgentPrioritizedReplayBuffer" in replay_buffer_config (not "MultiAgentReplayBuffer"), as the old stack does not support the episode-based buffer. Example:

config = (
    DQNConfig()
    .framework("torch")
    .environment("sumo_marl", env_config=env_config)
    .api_stack(enable_rl_module_and_learner=False, enable_env_runner_and_connector_v2=False)
    .env_runners(num_env_runners=1, num_envs_per_env_runner=1, num_cpus_per_env_runner=3, sample_timeout_s=50000)
    .multi_agent(
        policies=["shared"],
        policy_mapping_fn=lambda agent_id, *a, **kw: "shared",
    )
    .learners(num_learners=0, num_cpus_per_learner=3)
    .training(
        gamma=0.99,
        lr=1e-4,
        num_steps_sampled_before_learning_starts=20_000,
        train_batch_size=4096,
        replay_buffer_config={
            "type": "MultiAgentPrioritizedReplayBuffer",
            "capacity": 300_000,
        },
        target_network_update_freq=8000,
        double_q=True,
        dueling=True,
        n_step=1,
        epsilon=EPS_SCHEDULE,
    )
    .callbacks(MetricsLoggerCallback)
    .debugging(seed=SEED)
)

This avoids the new stack’s episode buffer and uses the supported prioritized buffer for multi-agent DQN.

Would you like a step-by-step explanation or more details?

Sources:

Hint: Mention @RunLLM in the post for followups.