I have a custom multi-agent environment with multiple groups of agents, where I would like each group to utilize the same policy. I configured this by setting AlgorithmConfig.multi_agent(policies={...}, policy_mapping_fn=...). Having the policy remember a history past states and actions would be very useful since the optimal action for a given state in my custom environment depends on prior states and actions. Hence naturally this would lead to using a recurrent layer (use_lstm in the model config dictionary).
However, each agent within a group has a different RL environment state and action. Although I would like to have parameter-sharing between the policy used for each agent, I don’t want each agent to have the same LSTM hidden/cell states since each agent acts independently. How can I do this?
I am looking through the rllib source code but it is very convoluted
My goal is basically that the actor/policy network and the critic/value network should have separate LSTM hidden states for each agent, but still have parameter sharing during training.
Could you clarify which specific issue or topic you are referring to? There are multiple issues discussed in the data sources, such as Ray job deadlocks, OOM errors, or dashboard persistence. Please specify the context or provide more details so I can give you the most relevant update and sources.
If you are referring to Ray jobs getting stuck with idle actors and pending tasks, there have been recent fixes—users are encouraged to try the latest Ray version, as mentioned in this GitHub issue. Let me know if you want more detail or if your issue is different.
@RunLLM Just was wondering if there was any recent update from Ray on configuring a multi-agent shared LSTM policy. I’m having 1. an assertion error or 2. the env not able to progress due to the Connector V2 assigning partial actions in multi-agent settings. Since not all agents act in a step due to the Connector V2’s chunking, when an agent that didn’t act the last step acts now, then it’s LSTM gets both an old observation and a re-emitted observation (which are the same), so it has two of the same observations at t, triggering the assertion error. On the other hand, If I don’t advance the state and wait until ALL agents receive actions, this gets stuck in an indefinite loop for some reason. I was wondering if there was an update to gracefully handle this situation.
It seems likely that this issue—multi-agent shared LSTM policies with Connector V2, where not all agents act every step, leading to assertion errors or stuck environments—remains a challenge. The most recent discussions and documentation indicate that RLlib’s multi-agent API expects only those agents that should act next to be present in the observation dict, and Connector V2 pipelines may not yet fully handle the asynchrony of agent actions with shared LSTM state, especially when agents skip steps and then reappear, causing repeated observations and LSTM state misalignment. There is no explicit mention of a new, official fix or update for this scenario in the latest documentation or issues, and users encountering this are encouraged to carefully manage agent state and observation emission in their environments to avoid such connector/LSTM mismatches.
If you want to avoid assertion errors, you may need to implement custom logic to ensure that LSTM state and observation alignment is maintained for each agent, possibly by padding or masking, or by ensuring that the connector pipeline does not emit duplicate observations for the same agent at the same timestep. There is no clear, out-of-the-box solution for this in the current RLlib release, and the problem is not directly addressed in the latest multi-agent or connector documentation. Would you like a more detailed breakdown or code-level suggestions?