Population Based Training and optimizer checkpointing

bwood · February 15, 2021, 3:07am

Hi all,

I am trying to run PBT to determine the optimal learning rate for my model — my question is about checkpointing the optimizer. I am currently using AdamW (in pytorch) and as mentioned here it would be nice to save the previous optimizer state, but checkpointing seems to overwrite the learning rate from the config — so the learning rate would never be able to mutate? Does Ray Tune get around this some how? If it does I am curious how. Or do you need to ensure the learning rate is updated from the config after checkpointing? Thanks in advance.

Brandon

rliaw · February 15, 2021, 4:00am

Yeah, you just need to ensure that the learning rate is updated from the config after restoring from checkpoint.

Topic		Replies	Views
Can't save Checkpoint wenn using Tensorflow and PBT Ray Tune	4	1338	January 12, 2021
Intended behaviour or smth wrong with my code? "no checkpoint for trial. Skip exploit for Trial" Ray Core	6	783	March 27, 2021
Tune Performance Bottlenecks Ray Tune	8	3671	February 8, 2021
Checkpointing with distributed training Ray Tune	14	889	April 20, 2021
How to resume training from a checkpoint RLlib	6	1880	December 22, 2023

Population Based Training and optimizer checkpointing

Related topics