I am using a custom env. Theoretically it allows for early exit of one agent.
When I set up the dones dict in the step function like: if white_done and black_done:
new_done = dict({“all”: True})
elif white_done and not black_done:
new_done = dict({“white”:done, “all”: False})
elif black_done and not white_done:
new_done = dict({“black”:done, “all”: False})
else:
new_done = dict({“all”: False})
I get an error: Batches sent to postprocessing must only contain steps from a single trajectory
When I pass in: if white_done and black_done:
new_done = dict({“all”: True})
elif white_done and not black_done:
new_done = dict({ “all”: False})
elif black_done and not white_done:
new_done = dict({ “all”: False})
else:
new_done = dict({“all”: False})
It works now. However RL Lib also expects an obs dict with both agent’s information; even though one agent may be done.
This probably won’t solve it entirely but I think the intention is that every agent that has a value in one of the dictionaries will have a value in all of them. So when your agents are not done they should have an entry in the done dictionary {“white”:False, …}. It seems to be working without that so maybe it is not needed but that is how I do it. You will need to decide on a terminal observation and reward. I think once an agent has returned done it no longer needs to appear in the dictionary but I have not checked that.
Hey @kia , on a first glance, this seems like a bug. The first script also looks ok and should work.
Could you provide a small self-sufficient script with some dummy 2-player env that would behave such that the error occurs?