Setting multi agent early exit from Custom Env

Hi,

I am using a custom env. Theoretically it allows for early exit of one agent.
When I set up the dones dict in the step function like: if white_done and black_done:
new_done = dict({“all”: True})
elif white_done and not black_done:
new_done = dict({“white”:done, “all”: False})
elif black_done and not white_done:
new_done = dict({“black”:done, “all”: False})
else:
new_done = dict({“all”: False})

I get an error: Batches sent to postprocessing must only contain steps from a single trajectory

When I pass in: if white_done and black_done:
new_done = dict({“all”: True})
elif white_done and not black_done:
new_done = dict({ “all”: False})
elif black_done and not white_done:
new_done = dict({ “all”: False})
else:
new_done = dict({“all”: False})

It works now. However RL Lib also expects an obs dict with both agent’s information; even though one agent may be done.

How can I resolve this ?

Furthermore, what is happening now, is that only agent White or agent Black gets ‘done’, per episode, rather than both.

Hi @kia,

This probably won’t solve it entirely but I think the intention is that every agent that has a value in one of the dictionaries will have a value in all of them. So when your agents are not done they should have an entry in the done dictionary {“white”:False, …}. It seems to be working without that so maybe it is not needed but that is how I do it. You will need to decide on a terminal observation and reward. I think once an agent has returned done it no longer needs to appear in the dictionary but I have not checked that.

Out of the two snippets I provided, the latter works. I guess its just a difficult env to solve.

Hey @kia , on a first glance, this seems like a bug. The first script also looks ok and should work.
Could you provide a small self-sufficient script with some dummy 2-player env that would behave such that the error occurs?