Hi. I wanted to try a special multi agent environment. Actually, hierarchical.
The high-level runs for first N1 steps.
After that, low-level runs for maximum N2 steps.
After low-level finishes, so at N1+N2 steps, reward is given to high-level. All the rewards previous to N1+N2 is 0. So it’s basically within the high-level agent’s last step, low-level agent episode is executed.
However, in current rllib implementation, I can’t do this.
Is there a way? Am I missing it?