Use LSTM model for policy gradient multi-agent with different recurrent hidden states per agent

RunLLM · October 23, 2025, 9:43pm

It seems likely that this issue—multi-agent shared LSTM policies with Connector V2, where not all agents act every step, leading to assertion errors or stuck environments—remains a challenge. The most recent discussions and documentation indicate that RLlib’s multi-agent API expects only those agents that should act next to be present in the observation dict, and Connector V2 pipelines may not yet fully handle the asynchrony of agent actions with shared LSTM state, especially when agents skip steps and then reappear, causing repeated observations and LSTM state misalignment. There is no explicit mention of a new, official fix or update for this scenario in the latest documentation or issues, and users encountering this are encouraged to carefully manage agent state and observation emission in their environments to avoid such connector/LSTM mismatches.

If you want to avoid assertion errors, you may need to implement custom logic to ensure that LSTM state and observation alignment is maintained for each agent, possibly by padding or masking, or by ensuring that the connector pipeline does not emit duplicate observations for the same agent at the same timestep. There is no clear, out-of-the-box solution for this in the current RLlib release, and the problem is not directly addressed in the latest multi-agent or connector documentation. Would you like a more detailed breakdown or code-level suggestions?

Sources:

https://discuss.ray.io/t/different-step-space-for-different-agents/2988

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Multi-Agent PPO rollout stalls + “duplicate agent at same env-t” assertion under Connector-v2 (Ray 2.49.2) RLlib	3	39	October 17, 2025
Multi agent partial parameter sharing RLlib	2	444	November 30, 2023
Sharing an LSTM cell between policies RLlib	2	412	July 1, 2021
Decentralized multi agent reinforcement learning RLlib	4	215	November 2, 2024
Multi agent checkpoints - KeyError: 'default_policy' RLlib	1	608	October 30, 2021

Use LSTM model for policy gradient multi-agent with different recurrent hidden states per agent

Related topics