Different episode segmentations for different agents in multiagent?

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • High: It blocks me to complete my task.

I am looking at a setting that would have different episode segmentations for different agents in a multiagent setting. E.g. for a sequence of 100 steps, this would be one single long episode for agent A, but 10 episodes of 10 steps each for agent B. Is it possible at all to do this in rllib?

If I just return dones['agent_B'] = true from the environment every 10 steps, I get an error from SimpleListCollector, saying “Batches sent to postprocessing must only contain steps from a single trajectory.” If I also set dones["__all__"] = true on every 10th step, I don’t get that error, but I assume it will then treat the episode as ended for agent A as well.

Is there a simple way to do what I am trying to do in rllib? Could I force each agent to have its own SampleCollector? If not, could I subclass from SampleCollector and change things there to allow different episode segmentations, or are other parts of rllib also expecting that one episode for agent A = one epside for agent B? Finally, if postprocessing is the issue, could I somehow segment the episode for agent B after the fact, e.g. by returning the episode ends in the infos dict instead, and setting dones to true in a callback after postprocessing?

@mgerstgrasser,

If I understand what you are asking, you could do this by creating new agent ids every 10 steps and assign all the agents that start with B to the same policy.

config["multiagent"]["policies"] = {"A":(...), "B":(...)} 
config["multiagent"]["policy_mapping_fn"] = lambda id, **kwargs: id[0]

For this example lets assume that it is every 5 steps instead of 10 to save space
Your environment observations would look like this on each step:

A,B1
A,B1
A,B1
A,B1
A,B1 (B1 done)
A,B2
A,B2
A,B2
A,B2
A,B2 (B2 done)
A,B3
A,B3
A,B3
A,B3
A,B3 (B3 done)
....

Ohhhh. Neat! Yes, I think that would do what I’m wanting to do. Thank you!!