Thank you @arturn for your quick response, it’s helpful.
I understand that MAML policy needs to be fine-tuned and it is possible directly using PPO algorithm of RLlib (This thread mentions it and it has been tested also: MAML finetune adaptation step for inference).
Is there any better approach in RLlib to fine-tune MBMPO policy?
Regards