Is there way to handle variable number of agents? In my custom environment agents can die or appear in one episode, and therefore their number changes. Because of this a get a different batch size for agent’s observations and other agent observation when trying handle it for the centralized critic. Now I reset the observations of dead agents and store them throughout the entire episode, but I think this approach is not effective
What I do in this case is something similar to what you said. I always return all the agents that have existed in the environment, in my case it is at most 13, and the agents that are dead have an observation of all 0s and reward of 0. That works well for me.
Thank you for answer! I did this, but new agents, that spawn during an episode, have batch size less than older.
I want to try set observation function with Repeated space for opponent’s actions, but not shure that it will be work