Working on a project where we would like to write several low-level rule-based policies and feed them into a top-level reinforcement learning policy to decide on an action. We’ve implemented Policy classes for each of the low-level policies that decide on an action based off the received observation.
We’ve also implemented an environment wrapper that alternates between stepping the low-level and top-level policies in a similar way to the example in https://github.com/ray-project/ray/blob/master/rllib/examples/env/windy_maze_env.py.
We would like to use one of RLLib’s provided trainers, e.g. IMPALA, to train the top-level policy, but are unsure how to set this up in tune.run() to work alongside our rule-based policies. Is this possible in any of the trainers, or would we have to implement our own Trainer class?
Later on we’d like to have several of these hierarchical agents acting in our environment so would also appreciate some advice on how to set that up.
Thanks for your help!