[RLlib] updating batch_size or similar while training

andrew-rosenfeld-ts · February 9, 2021, 11:29pm

I know that one can update learning rate over training using lr_schedule, and I also noticed entropy_coeff_schedule. These seem to have special handling within the policy classes.

Does there exist a more general approach to modify other training parameters, e.g. batch_size or maybe num_sgd_iter?

Or perhaps even updating some of your environment config over time? I know this makes for non-stationary training, but I can imagine cases where this might be useful.

Is the only approach to stop training and restart it with a new config?

sven1977 · February 12, 2021, 9:58am

Currently, unfortunately, yes. There is no generic way of wrapping schedules around arbitrary config keys.

duburcqa · May 18, 2021, 10:54am

I’m also trying to schedule the training batch size. Do you think it could still be done using custom callback or any hack ? Some papers of OpenAI and GoogleBrain suggest that it may be beneficial to schedule it rather than the learning rate and I would like to challenge those conclusions.

andrew-rosenfeld-ts · May 18, 2021, 2:08pm

Maybe not the cleanest solution, but you could train for a period, then extract the weights from your trainer (trainer.get_weights()), modify the config (changing the batch size), and then load up a new trainer with this modified config and set the weights on it (I think you need to set on both the trainer and its remote workers), and then resume training.

duburcqa · May 19, 2021, 2:54pm

Yes I though about this, but actually I would rather avoid resetting the internal momentum parameters of low-level ADAM optimizer that is used in place of vanilla SGD if I’m not mistaking. But yes maybe just saving the trainer, modifying the backup, then loading the backup is the only way.

After thorough investigation reading the source code, it appears it is impossible to avoid building a new optimizer to update the training batch size because the value is “hard-coded” in execution_plan method of PPO. So I think the best way may be to call Trainer.with_updates that can be used to build a new trainer overriding some parameters, and use Trainer.__getstate__ / Trainer.__setstate__ to restore the internal state of the trainer.

Topic		Replies	Views
[RLlib] Ray RLlib config parameters for PPO RLlib	8	7685	April 28, 2021
Understanding train_batch_size in multiagent RL RLlib	0	370	November 22, 2021
[rllib] Modify multi agent env reward mid training RLlib	7	1346	May 27, 2021
Change learning rete for DQN RLlib	6	513	February 25, 2022
Support for annealing gamma RLlib	2	508	May 20, 2021

[RLlib] updating batch_size or similar while training

Related topics