Multi-Agent PPO rollout stalls + “duplicate agent at same env-t” assertion under Connector-v2 (Ray 2.49.2)

RunLLM · October 17, 2025, 2:32am

It seems likely that this is a known challenge: in multi-agent RLlib setups with LSTM policies, Connector V2’s partial action delivery can cause LSTM hidden states to become unsynchronized if not every agent acts at every step. This can result in stale or mismatched state_in/state_out transitions, leading to assertion errors or duplicate (agent_id, t) entries, as you described. The recommended approach appears to be explicit per-agent LSTM state tracking within your environment or rollout logic, ensuring that each agent’s hidden state is only updated when it actually acts, and otherwise preserved for the next time it is scheduled to act. This pattern is discussed in this Ray forum thread, where users confirm that per-agent state management is necessary for correct LSTM operation in multi-agent, partially-observing environments.

My understanding is that there is no built-in RLlib mechanism to automatically synchronize LSTM states for agents that skip steps under Connector V2; manual bookkeeping is required. If you want a step-by-step example or more detail on how to implement this, let me know!

Sources:

this Ray forum thread

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Potential bug in trajectory view API for multiagent envs RLlib	6	724	February 12, 2021
Agent_key and policy_id mismatch on multiagent ensemble training RLlib	9	935	March 30, 2021
[Rllib][Bug] Custom Multi-agent Environment Observation Space " does not contain returned observation after a reset" RLlib	0	325	April 4, 2022
Multiagent environment crashes when more than two agents are initiated RLlib	1	232	April 17, 2023
Error: TypeError: 'EnvContext' object cannot be interpreted as an integer? RLlib	6	1802	February 19, 2021

Multi-Agent PPO rollout stalls + “duplicate agent at same env-t” assertion under Connector-v2 (Ray 2.49.2)

Related topics