How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hey everyone,
I am wondering how to control the inner loop gradient update in MAML. In PPO, we have the parameters train_batch_size
, sgd_minibatch_size
, and num_sgd_iter
to control the batchsize and number of SGD iterations for the training process.
In MAML however, we have the parameters inner_adaptation_steps
and train_batch_size
. How do they control the inner loop gradient update? My understanding is, that if inner_adaptation_steps = 1
we collect all the samples of 1 episode and then we perform a gradient update. But how is the update prcoess working exactly? Do we perform several iterations of updating with minibatches like in PPO? Or is it just 1 gradient update with all samples of 1 episode? And what is the role of train_batch_size
then?
Thanks for your help!
Best,
Arthur