PPO + LSTM + Multi-agent with PrevActionsPrevRewards → "all input arrays must have the same shape" error

lsg · August 11, 2025, 11:56am

1. Severity of the issue: (select one)

High: Completely blocks me.

2. Environment:

Ray version: 2.37.0
Python version: Python 3.10.4

3. What happened vs. what you expected:

I’m running PPO + LSTM with the new API stack in a multi-agent environment, and I’m hitting the following error during training:

ValueError: all input arrays must have the same shape

I’m using an env-to-module connector pipeline like this:

def _env_to_module(env, multi_agent=True, n_prev_rewards=1, n_prev_actions=1):
return [
PrevActionsPrevRewards(
multi_agent=multi_agent,
n_prev_rewards=n_prev_rewards,
n_prev_actions=n_prev_actions,
),
FlattenObservations(multi_agent=multi_agent),
]

I expected the PrevActionsPrevRewards connector to just append the previous actions/rewards (constant size) to the observation, so that after FlattenObservations the observation shape is fixed for each agent.

What actually happens

At the very beginning of the episode, observations have the expected shapes and values.
After some steps, all agents’ observations suddenly become longer (extra dimensions of previous action and reward appear), which breaks batching and causes the above ValueError.
I checked: PrevActionsPrevRewards.__call__ is not being called twice per step for the same agent (so no obvious duplicate processing).
The issue only happens in multi-agent; if I run with a single agent, I cannot reproduce the error.

What I tried

Printed shapes of obs inside PrevActionsPrevRewards — always correct at the time of return.
Verified that action/reward histories per agent are consistent.

Has anyone seen a case where in multi-agent PPO + LSTM with PrevActionsPrevRewards + FlattenObservations the observation size suddenly changes mid-episode?

Topic		Replies	Views
How should you end a MultiAgentEnv episode? RLlib	16	1322	October 1, 2022
PPO centralized critic example with more than two agents RLlib	4	1860	October 19, 2021
PPO Centralized critic RLlib	0	639	February 10, 2021
Error: TypeError: 'EnvContext' object cannot be interpreted as an integer? RLlib	6	1800	February 19, 2021
Question about Environment/Observation construction RLlib	1	391	June 17, 2021

PPO + LSTM + Multi-agent with PrevActionsPrevRewards → "all input arrays must have the same shape" error

What actually happens

What I tried

Related topics