[RLlib] updating batch_size or similar while training

I know that one can update learning rate over training using lr_schedule, and I also noticed entropy_coeff_schedule. These seem to have special handling within the policy classes.

Does there exist a more general approach to modify other training parameters, e.g. batch_size or maybe num_sgd_iter?

Or perhaps even updating some of your environment config over time? I know this makes for non-stationary training, but I can imagine cases where this might be useful.

Is the only approach to stop training and restart it with a new config?

Currently, unfortunately, yes. There is no generic way of wrapping schedules around arbitrary config keys.

I’m also trying to schedule the training batch size. Do you think it could still be done using custom callback or any hack ? Some papers of OpenAI and GoogleBrain suggest that it may be beneficial to schedule it rather than the learning rate and I would like to challenge those conclusions.

Maybe not the cleanest solution, but you could train for a period, then extract the weights from your trainer (trainer.get_weights()), modify the config (changing the batch size), and then load up a new trainer with this modified config and set the weights on it (I think you need to set on both the trainer and its remote workers), and then resume training.

Yes I though about this, but actually I would rather avoid resetting the internal momentum parameters of low-level ADAM optimizer that is used in place of vanilla SGD if I’m not mistaking. But yes maybe just saving the trainer, modifying the backup, then loading the backup is the only way.

After thorough investigation reading the source code, it appears it is impossible to avoid building a new optimizer to update the training batch size because the value is “hard-coded” in execution_plan method of PPO. So I think the best way may be to call Trainer.with_updates that can be used to build a new trainer overriding some parameters, and use Trainer.__getstate__ / Trainer.__setstate__ to restore the internal state of the trainer.