How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I am wondering how to control the inner loop gradient update in MAML. In PPO, we have the parameters
num_sgd_iter to control the batchsize and number of SGD iterations for the training process.
In MAML however, we have the parameters
train_batch_size. How do they control the inner loop gradient update? My understanding is, that if
inner_adaptation_steps = 1 we collect all the samples of 1 episode and then we perform a gradient update. But how is the update prcoess working exactly? Do we perform several iterations of updating with minibatches like in PPO? Or is it just 1 gradient update with all samples of 1 episode? And what is the role of
Thanks for your help!