1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.48
- Python version: 3.11
- OS: Windows 11
- Cloud/Infrastructure:
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected:
- Actual:
Hi everyone,
I am working on a multi-agent reinforcement learning project for energy management using RLlib, and I’ve run into a critical issue with the latest RLlib (2.48). My environment implements two agents, both with custom observation and action spaces. My setup is not working with DQN (with prioritized replay buffer), training always fails with errors related to the new API stack and batch handling.
What I’ve tried:
-
RLlib 2.48, PyTorch backend, standard MultiAgentEnv subclass.
-
I follow the recommended API (using
.api_stack(...)
), but also tried without it. -
I configure the prioritized replay buffer exactly as described in the docs.
-
I also tested older RLlib versions, which work better, but I would like to use the latest version for other features.
Main error with DQN + MultiAgent + Prioritized Replay in RLlib 2.48:
AttributeError: ‘list’ object has no attribute ‘as_multi_agent’
…
TypeError: unhashable type: ‘slice’
my config:
config = (
DQNConfig()
.environment(
env=SmartEnergyEnv,
env_config={“csv_path”: csv_path}
)
.multi_agent(
policies={
“wp_agent”: (None, tmp_env.observation_space_wp, tmp_env.action_space_wp, {}),
“battery_agent”: (None, tmp_env.observation_space_batt, tmp_env.action_space_batt, {}),
},
policy_mapping_fn=lambda aid, *args, **kwargs: aid,
)
.framework(“torch”)
.training(
train_batch_size=64,
gamma=0.99,
lr=1e-3,
replay_buffer_config={
“type”: “MultiAgentPrioritizedReplayBuffer”,
“capacity”: 50000,
“prioritized_replay_alpha”: 0.6,
“prioritized_replay_beta”: 0.4,
“prioritized_replay_eps”: 1e-6,
“replay_sequence_length”: 1,
},
)
I also tried it with the old API-stack:
.api_stack(
enable_env_runner_and_connector_v2=False,
enable_rl_module_and_learner=False,
)
no success.
My questions:
-
Has anyone succeeded in running a multi-agent DQN with prioritized replay in RLlib 2.48 (or recent versions)?
-
Is there a known workaround or fix for this incompatibility in the new API stack?
-
Is there a recommended setup for DQN with prioritized replay in the current RLlib, or should I stick to PPO for now?
-
Any official statements if this is going to be supported in the near future?
Would appreciate any pointers or shared experiences! Maybe someone got a working example for me. Thanks in advance!