Hi all, recently I’m experimenting with the neuralMMO environment, where you need to simultaneously control 8 agents on the same team to defeat the other 15 eight-agents teams. There are competitive elements that allow attacking other agents in the hostile teams and knocking them off the game. If an agent dies early, its respective done state is set to True while the environment is still running and your team is still alive.
However, having entries equal to true beside the last entries in the done key of SampleBatch is clearly not acceptable due to the following error
Batches sent to postprocessing must only contain steps from a single trajectory
So I assume you should never set any done key to true unless it is the true terminal state for the entire environment. In the retrospect, I can sort of understand why this restriction was in place. Maybe you don’t want unequal length trajectory among the agents, causing issues for the trainer, or having the need to start a new episode just to fill the missing experiences in those shorter-than-expected trajectories.
I saw several posts and GitHub issues mentioned this already but didn’t get a proper solution from the ray devs.
https://github.com/ray-project/ray/issues/10761
https://discuss.ray.io/t/setting-multi-agent-early-exit-from-custom-env/2087
Thanks everyone.