How you handle agents early exiting from the environment?

mickelliu · May 4, 2022, 2:13pm

Hi all, recently I’m experimenting with the neuralMMO environment, where you need to simultaneously control 8 agents on the same team to defeat the other 15 eight-agents teams. There are competitive elements that allow attacking other agents in the hostile teams and knocking them off the game. If an agent dies early, its respective done state is set to True while the environment is still running and your team is still alive.

However, having entries equal to true beside the last entries in the done key of SampleBatch is clearly not acceptable due to the following error

Batches sent to postprocessing must only contain steps from a single trajectory

So I assume you should never set any done key to true unless it is the true terminal state for the entire environment. In the retrospect, I can sort of understand why this restriction was in place. Maybe you don’t want unequal length trajectory among the agents, causing issues for the trainer, or having the need to start a new episode just to fill the missing experiences in those shorter-than-expected trajectories.

I saw several posts and GitHub issues mentioned this already but didn’t get a proper solution from the ray devs.
https://github.com/ray-project/ray/issues/10761
https://discuss.ray.io/t/setting-multi-agent-early-exit-from-custom-env/2087

Thanks everyone.

mickelliu · May 5, 2022, 8:15am

Nevermind, turns out it’s a bug in my code. You shouldn’t pad dummy observations, rewards, dones or infos in the trajectories, if you are using MultiAgentBatch you should leave it as it is.

Topic		Replies	Views
Setting multi agent early exit from Custom Env RLlib	5	607	April 15, 2024
MultiAgentEnv reward and terminated / truncated RLlib	0	318	October 12, 2023
Multi-Agent cyclic games with paused agents RLlib	2	461	September 27, 2021
How to Handle Agent Death In MultiAgent Scenarios RLlib	1	129	April 22, 2024
Post process trajectory with full episode RLlib	1	407	October 17, 2023

How you handle agents early exiting from the environment?

Related topics