1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.
2. Environment:
Ray version: 2.50
Python version: 3.12
OS: unix
3. What happened vs. what you expected:
Expected: Using .get_infos(-1) in CustomCallbacks can return the most resent info from the turn-based-multi-agent environment.
Actual: it works well excepts the last step
I gave all the agents same info and only one agent is allowed to take action for each step.
If you call episode.get_infos(-1) in the on_episode_step method of CustomCallbacks, it will return the resent info excepts the last step. If the episode have 16 steps in total, the .get_infos(-1) will return the 13th info for the last step. However, you can get the last info by calling it in the on_episode_end method using a counter which collects the correct index of that info (like .get_infos(counter)).**
It seems likely that this is a known issue with turn-based multi-agent environments in Ray RLlib, where only one agent acts per step and the info returned by get_infos(-1) in on_episode_step may not reflect the very last info dict for all agents. This behavior is consistent with the way RLlib tracks environment steps versus agent steps, and similar issues have been reported by other users, especially in turn-based or partially-observed settings. The info is correctly available in on_episode_end, suggesting the data is eventually synchronized, but not always available at the expected time in on_episode_step for the last environment step.
My understanding is that this is not explicitly documented as a bug, but it is a limitation of the current implementation, particularly for turn-based environments where not all agents act every step. There are open discussions and related issues about info and observation synchronization in such cases, and workarounds (like using on_episode_end) are commonly used. If this behavior is blocking or confusing, it may be worth opening a GitHub issue to clarify or request a fix, as the current design may not fully support all turn-based multi-agent use cases out of the box. Would you like more detail on the underlying cause or possible workarounds?