PPO + LSTM + Multi-agent with PrevActionsPrevRewards → "all input arrays must have the same shape" error

1. Severity of the issue: (select one)

High: Completely blocks me.

2. Environment:

  • Ray version: 2.37.0
  • Python version: Python 3.10.4

3. What happened vs. what you expected:

I’m running PPO + LSTM with the new API stack in a multi-agent environment, and I’m hitting the following error during training:

ValueError: all input arrays must have the same shape

I’m using an env-to-module connector pipeline like this:

def _env_to_module(env, multi_agent=True, n_prev_rewards=1, n_prev_actions=1):
return [
PrevActionsPrevRewards(
multi_agent=multi_agent,
n_prev_rewards=n_prev_rewards,
n_prev_actions=n_prev_actions,
),
FlattenObservations(multi_agent=multi_agent),
]

I expected the PrevActionsPrevRewards connector to just append the previous actions/rewards (constant size) to the observation, so that after FlattenObservations the observation shape is fixed for each agent.

What actually happens

  • At the very beginning of the episode, observations have the expected shapes and values.

  • After some steps, all agents’ observations suddenly become longer (extra dimensions of previous action and reward appear), which breaks batching and causes the above ValueError.

  • I checked: PrevActionsPrevRewards.__call__ is not being called twice per step for the same agent (so no obvious duplicate processing).

  • The issue only happens in multi-agent; if I run with a single agent, I cannot reproduce the error.

What I tried

  • Printed shapes of obs inside PrevActionsPrevRewards — always correct at the time of return.

  • Verified that action/reward histories per agent are consistent.

Has anyone seen a case where in multi-agent PPO + LSTM with PrevActionsPrevRewards + FlattenObservations the observation size suddenly changes mid-episode?