How does Ray RLLib handle individual agent EndEpisode calls in Unity 3D environments?

Stephane_Capponi · October 28, 2024, 12:24am

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

I’m working with the Unity 3D Ball environment using ML-Agents and integrating it with Ray RLLib for multi-agent reinforcement learning. In Unity, each agent can call EndEpisode and reset individually when using ML-Agents. However, when integrating with Ray RLLib, it seems like the environment only resets when all agents are done.

From reviewing the Ray RLLib code, I noticed that the environment reset happens collectively rather than individually for each agent. My question is:

1.	How does Ray RLLib handle the EndEpisode calls from individual agents?
2.	Does Ray RLLib ignore the individual EndEpisode calls and wait until all agents are done before resetting the environment?
3.	Is there a recommended approach to manage environments where individual agents have different episode lengths?

I’m trying to understand if the EndEpisode behavior in Unity is fully compatible with Ray RLLib or if there’s something additional that needs to be configured or implemented.

Here is the ray rllib code that I’m talking about: ray/rllib/env/wrappers/unity3d_env.py at d81f4d8fcd88e21831721c427de7797cf17817f6 · ray-project/ray · GitHub

The step method retune done = false for every agents unless they are all done if self.episode_timesteps > self.episode_horizon

I don’t understand. Can someone explain ?

Topic		Replies	Views
Reproducing results from stablebaselines 3 RLlib	2	652	August 6, 2021
MultiAgent training Issues RLlib	1	473	April 9, 2024
When does an environment reset()? RLlib	5	1544	February 7, 2023
Different episode segmentations for different agents in multiagent? RLlib	2	278	June 30, 2022
[Rllib][Bug] Custom Multi-agent Environment Observation Space " does not contain returned observation after a reset" RLlib	0	318	April 4, 2022

How does Ray RLLib handle individual agent EndEpisode calls in Unity 3D environments?

Related topics