How you handle agents early exiting from the environment?

Hi all, recently I’m experimenting with the neuralMMO environment, where you need to simultaneously control 8 agents on the same team to defeat the other 15 eight-agents teams. There are competitive elements that allow attacking other agents in the hostile teams and knocking them off the game. If an agent dies early, its respective done state is set to True while the environment is still running and your team is still alive.

However, having entries equal to true beside the last entries in the done key of SampleBatch is clearly not acceptable due to the following error

Batches sent to postprocessing must only contain steps from a single trajectory

So I assume you should never set any done key to true unless it is the true terminal state for the entire environment. In the retrospect, I can sort of understand why this restriction was in place. Maybe you don’t want unequal length trajectory among the agents, causing issues for the trainer, or having the need to start a new episode just to fill the missing experiences in those shorter-than-expected trajectories.

I saw several posts and GitHub issues mentioned this already but didn’t get a proper solution from the ray devs.

Thanks everyone.

1 Like

Nevermind, turns out it’s a bug in my code. You shouldn’t pad dummy observations, rewards, dones or infos in the trajectories, if you are using MultiAgentBatch you should leave it as it is.

1 Like