Special case MultiAgent Environment

Hi. I wanted to try a special multi agent environment. Actually, hierarchical.

The high-level runs for first N1 steps.
After that, low-level runs for maximum N2 steps.
After low-level finishes, so at N1+N2 steps, reward is given to high-level. All the rewards previous to N1+N2 is 0. So it’s basically within the high-level agent’s last step, low-level agent episode is executed.
However, in current rllib implementation, I can’t do this.

Is there a way? Am I missing it?

Hi iykim,
What issues are you running with RLLib that is preventing you from running it? What errors are you seeing and do you have any reproducible code we can try out?
Christina