MultiAgentEpisode.get_infos(-1) Can't Work Correctly in the Turn-Based MultiAgent Environment

mizhou0309 · November 18, 2025, 3:55am

1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

Ray version: 2.50
Python version: 3.12
OS: unix

3. What happened vs. what you expected:

Expected: Using .get_infos(-1) in CustomCallbacks can return the most resent info from the turn-based-multi-agent environment.
Actual: it works well excepts the last step

I gave all the agents same info and only one agent is allowed to take action for each step.

If you call episode.get_infos(-1) in the on_episode_step method of CustomCallbacks, it will return the resent info excepts the last step. If the episode have 16 steps in total, the .get_infos(-1) will return the 13th info for the last step. However, you can get the last info by calling it in the on_episode_end method using a counter which collects the correct index of that info (like .get_infos(counter)).**

Is that a bug need to be fix?

RunLLM · November 18, 2025, 3:56am

It seems likely that this is a known issue with turn-based multi-agent environments in Ray RLlib, where only one agent acts per step and the info returned by get_infos(-1) in on_episode_step may not reflect the very last info dict for all agents. This behavior is consistent with the way RLlib tracks environment steps versus agent steps, and similar issues have been reported by other users, especially in turn-based or partially-observed settings. The info is correctly available in on_episode_end, suggesting the data is eventually synchronized, but not always available at the expected time in on_episode_step for the last environment step.

My understanding is that this is not explicitly documented as a bug, but it is a limitation of the current implementation, particularly for turn-based environments where not all agents act every step. There are open discussions and related issues about info and observation synchronization in such cases, and workarounds (like using on_episode_end) are commonly used. If this behavior is blocking or confusing, it may be worth opening a GitHub issue to clarify or request a fix, as the current design may not fully support all turn-based multi-agent use cases out of the box. Would you like more detail on the underlying cause or possible workarounds?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Episode.last_info_for() always returns empty dictionary in custom callback RLlib	2	632	June 30, 2021
Callback methods available? RLlib	0	211	January 19, 2024
EpisodeV2' object has no attribute 'last_info_for' RLlib	1	597	August 18, 2023
How should you end a MultiAgentEnv episode? RLlib	16	1364	October 1, 2022
Proper way of setting up a turn-based action-masked multiagent PPO Configure Algorithm, Training, Evaluation, Scaling	0	175	April 5, 2024

MultiAgentEpisode.get_infos(-1) Can't Work Correctly in the Turn-Based MultiAgent Environment

Related topics