How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
- High: It blocks me to complete my task.
I am looking at a setting that would have different episode segmentations for different agents in a multiagent setting. E.g. for a sequence of 100 steps, this would be one single long episode for agent A, but 10 episodes of 10 steps each for agent B. Is it possible at all to do this in rllib?
If I just return dones['agent_B'] = true
from the environment every 10 steps, I get an error from SimpleListCollector
, saying “Batches sent to postprocessing must only contain steps from a single trajectory.” If I also set dones["__all__"] = true
on every 10th step, I don’t get that error, but I assume it will then treat the episode as ended for agent A as well.
Is there a simple way to do what I am trying to do in rllib? Could I force each agent to have its own SampleCollector? If not, could I subclass from SampleCollector and change things there to allow different episode segmentations, or are other parts of rllib also expecting that one episode for agent A = one epside for agent B? Finally, if postprocessing is the issue, could I somehow segment the episode for agent B after the fact, e.g. by returning the episode ends in the infos
dict instead, and setting dones
to true
in a callback after postprocessing?