How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
The WindyMaze environment is the only hierarchical environment example I found (https://github.com/ray-project/ray/blob/master/rllib/examples/env/windy_maze_env.py). In this environment, the top-level agent, wind, does not get a reward and acts randomly and the low-level agent selects an action - to move in the wind direction or stay still. In my model I would like the top-level agent to select an action among actions {a, b, c}, and the low-level agent, related to the selected action (say a) would choose the action parameters, take the action a and both top-level and low-level agents would share the reward and observation. I’m wondering what is a way to achieve this. I am using a custom agent selector to order agents in such a way that after each step of the top-level agent, the corresponding low-level agent takes turns. But I don’t know how to pass the reward and observation to the top-level agent after the low-level agent got them.
I would be grateful for any ideas/suggestions.