1. Severity of the issue: (select one)
High: Completely blocks me.
2. Environment:
- Ray version: 2.37.0
- Python version: Python 3.10.4
3. What happened vs. what you expected:
I’m running PPO + LSTM with the new API stack in a multi-agent environment, and I’m hitting the following error during training:
ValueError: all input arrays must have the same shape
I’m using an env-to-module connector pipeline like this:
def _env_to_module(env, multi_agent=True, n_prev_rewards=1, n_prev_actions=1):
return [
PrevActionsPrevRewards(
multi_agent=multi_agent,
n_prev_rewards=n_prev_rewards,
n_prev_actions=n_prev_actions,
),
FlattenObservations(multi_agent=multi_agent),
]
I expected the PrevActionsPrevRewards connector to just append the previous actions/rewards (constant size) to the observation, so that after FlattenObservations the observation shape is fixed for each agent.
What actually happens
-
At the very beginning of the episode, observations have the expected shapes and values.
-
After some steps, all agents’ observations suddenly become longer (extra dimensions of previous action and reward appear), which breaks batching and causes the above
ValueError. -
I checked:
PrevActionsPrevRewards.__call__is not being called twice per step for the same agent (so no obvious duplicate processing). -
The issue only happens in multi-agent; if I run with a single agent, I cannot reproduce the error.
What I tried
-
Printed shapes of obs inside
PrevActionsPrevRewards— always correct at the time of return. -
Verified that action/reward histories per agent are consistent.
Has anyone seen a case where in multi-agent PPO + LSTM with PrevActionsPrevRewards + FlattenObservations the observation size suddenly changes mid-episode?