Different episode segmentations for different agents in multiagent?

mgerstgrasser · June 30, 2022, 2:12pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity
High: It blocks me to complete my task.

I am looking at a setting that would have different episode segmentations for different agents in a multiagent setting. E.g. for a sequence of 100 steps, this would be one single long episode for agent A, but 10 episodes of 10 steps each for agent B. Is it possible at all to do this in rllib?

If I just return dones['agent_B'] = true from the environment every 10 steps, I get an error from SimpleListCollector, saying “Batches sent to postprocessing must only contain steps from a single trajectory.” If I also set dones["__all__"] = true on every 10th step, I don’t get that error, but I assume it will then treat the episode as ended for agent A as well.

Is there a simple way to do what I am trying to do in rllib? Could I force each agent to have its own SampleCollector? If not, could I subclass from SampleCollector and change things there to allow different episode segmentations, or are other parts of rllib also expecting that one episode for agent A = one epside for agent B? Finally, if postprocessing is the issue, could I somehow segment the episode for agent B after the fact, e.g. by returning the episode ends in the infos dict instead, and setting dones to true in a callback after postprocessing?

mannyv · June 30, 2022, 3:57pm

@mgerstgrasser,

If I understand what you are asking, you could do this by creating new agent ids every 10 steps and assign all the agents that start with B to the same policy.

config["multiagent"]["policies"] = {"A":(...), "B":(...)} 
config["multiagent"]["policy_mapping_fn"] = lambda id, **kwargs: id[0]

For this example lets assume that it is every 5 steps instead of 10 to save space
Your environment observations would look like this on each step:

A,B1
A,B1
A,B1
A,B1
A,B1 (B1 done)
A,B2
A,B2
A,B2
A,B2
A,B2 (B2 done)
A,B3
A,B3
A,B3
A,B3
A,B3 (B3 done)
....

mgerstgrasser · June 30, 2022, 7:24pm

Ohhhh. Neat! Yes, I think that would do what I’m wanting to do. Thank you!!

Topic		Replies	Views
Post process trajectory with full episode RLlib	1	404	October 17, 2023
How does Ray RLLib handle individual agent EndEpisode calls in Unity 3D environments? RLlib	0	18	October 28, 2024
MultiAgentEnv reward and terminated / truncated RLlib	0	314	October 12, 2023
Multi-Agent cyclic games with paused agents RLlib	2	453	September 27, 2021
Understanding agent_timesteps_total RLlib	2	572	February 3, 2023

Different episode segmentations for different agents in multiagent?

Related topics