Problems using DQN with MultiAgent and Prioritized Replay Buffer in RLlib 2.48 (New API Stack)

s-rnk · August 6, 2025, 12:54pm

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

Ray version: 2.48
Python version: 3.11
OS: Windows 11
Cloud/Infrastructure:
Other libs/tools (if relevant):

3. What happened vs. what you expected:

Expected:
Actual:

Hi everyone,

I am working on a multi-agent reinforcement learning project for energy management using RLlib, and I’ve run into a critical issue with the latest RLlib (2.48). My environment implements two agents, both with custom observation and action spaces. My setup is not working with DQN (with prioritized replay buffer), training always fails with errors related to the new API stack and batch handling.

What I’ve tried:

RLlib 2.48, PyTorch backend, standard MultiAgentEnv subclass.
I follow the recommended API (using .api_stack(...)), but also tried without it.
I configure the prioritized replay buffer exactly as described in the docs.
I also tested older RLlib versions, which work better, but I would like to use the latest version for other features.

Main error with DQN + MultiAgent + Prioritized Replay in RLlib 2.48:

AttributeError: ‘list’ object has no attribute ‘as_multi_agent’
…
TypeError: unhashable type: ‘slice’

my config:

config = (
DQNConfig()
.environment(
env=SmartEnergyEnv,
env_config={“csv_path”: csv_path}
)
.multi_agent(
policies={
“wp_agent”: (None, tmp_env.observation_space_wp, tmp_env.action_space_wp, {}),
“battery_agent”: (None, tmp_env.observation_space_batt, tmp_env.action_space_batt, {}),
},
policy_mapping_fn=lambda aid, *args, **kwargs: aid,
)
.framework(“torch”)
.training(
train_batch_size=64,
gamma=0.99,
lr=1e-3,
replay_buffer_config={
“type”: “MultiAgentPrioritizedReplayBuffer”,
“capacity”: 50000,
“prioritized_replay_alpha”: 0.6,
“prioritized_replay_beta”: 0.4,
“prioritized_replay_eps”: 1e-6,
“replay_sequence_length”: 1,
},
)

I also tried it with the old API-stack:

    .api_stack(

        enable_env_runner_and_connector_v2=False,

        enable_rl_module_and_learner=False,    
)

no success.

My questions:

Has anyone succeeded in running a multi-agent DQN with prioritized replay in RLlib 2.48 (or recent versions)?
Is there a known workaround or fix for this incompatibility in the new API stack?
Is there a recommended setup for DQN with prioritized replay in the current RLlib, or should I stick to PPO for now?
Any official statements if this is going to be supported in the near future?

Would appreciate any pointers or shared experiences! Maybe someone got a working example for me. Thanks in advance!

Topic		Replies	Views
Multi Agent Prioritized Replay Buffer giving me trouble in DQN RLlib	1	92	July 10, 2025
Multi-agent Replay Buffer in DQN fails to run RLlib	0	20	July 17, 2025
'MultiAgentBatch' object has no attribute 'get' when using DQN and storing sequences in the Replay Buffer Debugging and performance tuning	0	118	May 11, 2024
R2D2 with Prioritized Replay Buffer RLlib	1	236	July 21, 2023
RLlib Multi-Agent/ReplayBuffer DQN/SAC Error: Agents with Different Observation Space Shapes Configure Algorithm, Training, Evaluation, Scaling	2	769	March 9, 2024

Problems using DQN with MultiAgent and Prioritized Replay Buffer in RLlib 2.48 (New API Stack)

Related topics