How to implement a meta RL

I am trying to implement an meta RL. For example a outer RL loop with a RL inner loop. How would one start with RLlib? I am asking for some ideas and guidance. Thanks!

Hey @RickLan , could you take a look at our MAML implementation (rllib/agents/maml/...)? @michaelzhiluo already solved this problem of the different loops in there. :slight_smile:

Thank you for the quick reply. I will take a look there!

Hi @RickLan,

Another place that might also make sense is the hierarchical training example. It essentially boils down to an outer and inner loop.

Thanks, @mannyv ! Will look as well.