Multi-Agent PPO rollout stalls + “duplicate agent at same env-t” assertion under Connector-v2 (Ray 2.49.2)

It seems likely that this is a known challenge: in multi-agent RLlib setups with LSTM policies, Connector V2’s partial action delivery can cause LSTM hidden states to become unsynchronized if not every agent acts at every step. This can result in stale or mismatched state_in/state_out transitions, leading to assertion errors or duplicate (agent_id, t) entries, as you described. The recommended approach appears to be explicit per-agent LSTM state tracking within your environment or rollout logic, ensuring that each agent’s hidden state is only updated when it actually acts, and otherwise preserved for the next time it is scheduled to act. This pattern is discussed in this Ray forum thread, where users confirm that per-agent state management is necessary for correct LSTM operation in multi-agent, partially-observing environments.

My understanding is that there is no built-in RLlib mechanism to automatically synchronize LSTM states for agents that skip steps under Connector V2; manual bookkeeping is required. If you want a step-by-step example or more detail on how to implement this, let me know!

Sources:

Hint: Mention @RunLLM in the post for followups.