Hey guys. Assume I have a two-level HRL algorithm, where a upper-level agent decides which sub-task to perform (two subtasks). For each task, I want to use multi-agent RL to solve it. That means each subtask have its own multi-agents to solve it. So the RL workflow is, at each time stamp, the upper level agent frist choose a subtasks. Then, the agents of the chosen tasks act simutaneously for serveral steps to solve that sub-task. When the subtask is solved, the upper-level agent will get the reward.
Can anyone give me some advice on how to implement my idea.
I think you could adapt this example. Maybe inside your environment you keep two sub-environments and switch between them based on the action of the higher environment.