How does Ray RLLib handle individual agent EndEpisode calls in Unity 3D environments?

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

I’m working with the Unity 3D Ball environment using ML-Agents and integrating it with Ray RLLib for multi-agent reinforcement learning. In Unity, each agent can call EndEpisode and reset individually when using ML-Agents. However, when integrating with Ray RLLib, it seems like the environment only resets when all agents are done.

From reviewing the Ray RLLib code, I noticed that the environment reset happens collectively rather than individually for each agent. My question is:

1.	How does Ray RLLib handle the EndEpisode calls from individual agents?
2.	Does Ray RLLib ignore the individual EndEpisode calls and wait until all agents are done before resetting the environment?
3.	Is there a recommended approach to manage environments where individual agents have different episode lengths?

I’m trying to understand if the EndEpisode behavior in Unity is fully compatible with Ray RLLib or if there’s something additional that needs to be configured or implemented.

Here is the ray rllib code that I’m talking about: ray/rllib/env/wrappers/unity3d_env.py at d81f4d8fcd88e21831721c427de7797cf17817f6 · ray-project/ray · GitHub

The step method retune done = false for every agents unless they are all done if self.episode_timesteps > self.episode_horizon

I don’t understand. Can someone explain ?