KeyError: 'infos' in concat_samples during training with PPO + Learner API

I’m encountering a KeyError: 'infos' during training.

File “…/sample_batch.py”, line 1688, in concat_samples
*[s[k] for s in concated_samples],
KeyError: ‘infos’

During rollout (sampling), all SampleBatch objects contain the key 'infos', and it’s a list of dicts (List[Dict]), as expected.

Example debug print:

Sample 0 keys: […, ‘infos’]
type(infos): list
infos[0]: {‘step’: 1, ‘travel_time’: None}

However, when RLlib calls concat_samples() on the data passed to the learner, only the first sample has 'infos', and all subsequent samples are missing that key.

Debug print inside sample_batch.py::concat_samples:

[ERROR] Missing key ‘infos’ during concat_samples.
Sample 0 has keys: [‘loss_mask’, ‘terminateds’, ‘obs’, ‘actions’, ‘rewards’, ‘truncateds’, ‘action_dist_inputs’, ‘EMBEDDINGS’, ‘action_logp’, ‘state_in’, ‘seq_lens’, ‘advantages’, ‘value_targets’, ‘infos’]
Sample 1 has keys: [‘loss_mask’, ‘terminateds’, ‘obs’, ‘actions’, ‘rewards’, ‘truncateds’, ‘action_dist_inputs’, ‘EMBEDDINGS’, ‘action_logp’, ‘state_in’, ‘seq_lens’, ‘advantages’, ‘value_targets’]
Sample 2 has keys: [‘loss_mask’, ‘terminateds’, ‘obs’, ‘actions’, ‘rewards’, ‘truncateds’, ‘action_dist_inputs’, ‘EMBEDDINGS’, ‘action_logp’, ‘state_in’, ‘seq_lens’, ‘advantages’, ‘value_targets’]
Sample 3 has keys: [‘loss_mask’, ‘terminateds’, ‘obs’, ‘actions’, ‘rewards’, ‘truncateds’, ‘action_dist_inputs’, ‘EMBEDDINGS’, ‘action_logp’, ‘state_in’, ‘seq_lens’, ‘advantages’, ‘value_targets’]
Sample 4 has keys: [‘loss_mask’, ‘terminateds’, ‘obs’, ‘actions’, ‘rewards’, ‘truncateds’, ‘action_dist_inputs’, ‘EMBEDDINGS’, ‘action_logp’, ‘state_in’, ‘seq_lens’, ‘advantages’, ‘value_targets’]

[DEBUG] Concatenating ‘infos’:
Sample 0 - type: <class ‘list’>, keys: N/A
Sample 1 - type: <class ‘dict’>, keys:
Sample 2 - type: <class ‘dict’>, keys:
Sample 3 - type: <class ‘dict’>, keys:
Sample 4 - type: <class ‘dict’>, keys:

I‘ve been stuck here for a long time. How can I solve this problem?

Environment:

  • Ray version: Ray 2.37.0
  • Python version: Python 3.10.4

@lsg Could you provide a repro so users can take a look into it?