Hi everyone!
I hope you’re all having a great day. I’m currently working on a project using RLlib for Multi-Agent RL and I’ve encountered a problem that I could use some help with. I’ll try to provide as much detail as possible, so please bear with me!
I’m working with multiple agents, each having a different observation space. Everything seems to be working fine when I use on-policy algorithms like PPO and A3C. However, when I switch to off-policy algorithms like SAC and DQN, I’m getting an error that I’m not sure how to fix. The error message is:
"AssertionError: built_steps (1) + ongoing_steps (1) != rollout_fragment_length (1)."
To give you more context, here’s the config I’m using to run my experiment:
algorithm = DQNConfig()
batch_size = 1024
config = (
algorithm
.environment(myMultiAgentEnv)
.framework("torch")
.rollouts(num_rollout_workers=8)
.training(gamma=0.99, lr=0.00005,dueling=True,double_q=True,before_learn_on_batch=True,
replay_buffer_config={'_enable_replay_buffer_api': True, 'type': 'MultiAgentReplayBuffer', 'capacity': 50000, 'replay_sequence_length': 1,})
.multi_agent(
policies={
"agent_one": (
None,
one_env.observation_space,
one_env.action_space,
{}
),
"agent_two": (
None,
two_env.observation_space,
two_env.action_space,
{}
),
"agent_three": (
None,
three_env.observation_space,
three_env.action_space,
{}
),
},
policy_mapping_fn=policy_mapping_fn,
)
.callbacks(CustomCallbacks)
.resources(num_cpus_per_worker=2,
num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
)
As you can see, I’ve set up three agents with their own observation and action spaces in the multi_agent configuration.
My main question is: What does this error message mean, and how can I resolve it?
I’m also curious whether the replay buffer in RLlib can handle multiple agents with different observation space shapes.
I would really appreciate any insights or suggestions you can offer.
Your help would mean a lot to me, and I’m eager to learn from your experiences.
Thank you so much for your time and consideration!