- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hello all,
Background: I have successfully used the HyperOpt tuner to tune my PPO hyper-parameters for a project of mine but with static learning rates. After many hours of research I have found that not using a learning rate schedule (specifically exponential) will cause the PPO algorithm to eventually collapse in terms of rewards. When using PPOtrainer for one hyper-parameter setting, I basically just save the checkpoint data every 10 training batches, stop the training process, manually modify the “lr” in the config dict such that lr_new=lr_prev*0.99 and resume training from the last saved checkpoint and keep doing that for 4000-8000 training batch updates. The problem is that I can’t implement this manual workaround with HyperOpt because HyperOpt is not running each hyper-parameter setting sequentially, rather in parallel.
Question: is there a way to define in the HyperOpt config dict a tunable exponential learning rate schedule? essentially the exponential learning rate schedule would be defined by 2 parameters (a & b) and a variable N; lr=aexp(-bN) where a is the initial lr, b is the rate of decay and N is episodes/batches/steps etc.
Below is an example of my current raytune experiment object:
alg = HyperOptSearch(metric=“episode_reward_mean”, mode=“max”)
analysis = tune.run(
“PPO”,
stop={“episodes_total”: target},
metric=“episode_reward_mean”,
mode=“max”,
config = {
“env”: SELECT_ENV,
“num_workers”: 8,
“lr”: tune.uniform(1e-6,1e-5),
“gamma”: tune.uniform(0.9,0.99),
“lambda”: tune.uniform(0.9, 0.99),
“train_batch_size”: tune.uniform(2048,8192),
“num_gpus”: 0,
“model”: {“fcnet_hiddens”: tune.choice([[256,256],[512,512],[1024,1024]])}
},
search_alg=alg,
num_samples=20,
)
Thanks in advance