While doing the inference using MAML based policy, how does the finetune adaptation step happens for a new meta-test task?
For inference, the policy will just start from the meta-learned prior (after MAML training) and do standard RL training, which is the “adaptation step”. The agent is supposed to adapt quickly to the environment, since the purpose of meta learning is to teach the agent to adapt quickly to new environments.
How does the MAML global gradient performs the 1-step gradient update to fine tune the weights to the new meta-test task?
There is a difference between MAML training and MAML test. Fine tuning the weights has nothing to do with MAML’s global gradient.
MAML training is coverred by RLlib with our MAML agent. Here we compute the meta gradient.
For MAML test, you can take the weights from a learned agent under MAML agent and adapt the weights to the test task with another training (recommend PPOtrainer, since we implement MAML-PPO, where PPO is the inner update step).
Also, how many steps does the agent needs to sample in meta test environment to perform the finetune gradient update step? Will this be equal to rollout fragment length?
The number of steps sampled in environment depends on how many timesteps your environment horizon is (HalfCheetah is 1000 for example) times how many episodes you want to collect to compute an adaptation step for the metalearned agent. If you want your agent to fully adapt to a new test environment, I recommend 10+ adaptation steps/RL iterations.
It will be equal to rollout fragment length if you set the config to complete_episodes.