Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hello all,
Background: I have successfully used the HyperOpt tuner to tune my PPO hyper-parameters for a project of mine but with static learning rates. After many hours of research I have found that not using a learning rate schedule (specifically exponential) will cause the PPO algorithm to eventually collapse in terms of rewards. When using PPOtrainer for one hyper-parameter setting, I basically just save the checkpoint data every 10 training batches, stop the training process, manually modify the “lr” in the config dict such that lr_new=lr_prev*0.99 and resume training from the last saved checkpoint and keep doing that for 4000-8000 training batch updates. The problem is that I can’t implement this manual workaround with HyperOpt because HyperOpt is not running each hyper-parameter setting sequentially, rather in parallel.
Question: is there a way to define in the HyperOpt config dict a tunable exponential learning rate schedule? essentially the exponential learning rate schedule would be defined by 2 parameters (a & b) and a variable N; lr=aexp(-bN) where a is the initial lr, b is the rate of decay and N is episodes/batches/steps etc.
Below is an example of my current raytune experiment object:
I think the question is more about does RLlib’s PPO trainer support taking in a “hyperparameterized” exponential decay lr-scheduler and if so, what is the expected format that Tune should give it.
HyperOpt or other searchers is not that relevant. At the end of the day, decay rate is just treated as one hyperparameter.
cc @kourosh to comment on PPO Trainer functionality.
Yeh my bad, its less relevant to the search algorithm itself. Overall I would like my tuner to be able to tune decay rate of lr and initial lr for an exponential decay, not linear decay which apparently is what “lr_schedule” does if I am not mistaken.
My understanding is that you want to implement lr_schedule inside PPOAlgorithm. I looked at the code, and it seems like we currently only support piecewise-linear learning rate schedule for PPO. I highly encourage you to file a github issue with a feature request (and maybe tag me so I can keep track of it on the future RLlib feature wishlist).
After digging a little into LearningRateSchedule class in torch_mixins and callbacks API here is my proposal on how you can manually achieve this. Introduce a callback that will update the learning rate on the local_worker that trains the policy on every iteration.
import math
from ray.rllib.algorithms.callbacks import DefaultCallbacks
from ray.rllib.algorithms.ppo import PPOConfig
class LRDecayCallback(DefaultCallbacks):
def on_train_result(
self,
*,
algorithm,
result: dict,
**kwargs,
) -> None:
iteration = algorithm.iteration
local_worker = algorithm.workers.local_worker()
# we will introduce this in a new config
lr_decay = algorithm.config.lr_decay
# for torch policy
for pid in local_worker.policy_map:
policy = local_worker.policy_map[pid]
policy.cur_lr = policy.cur_lr * math.exp(-lr_decay * iteration)
for opt in policy._optimizers:
for p in opt.param_groups:
p["lr"] = policy.cur_lr
class CustomPPOConfig(PPOConfig):
def __init__(self, algo_class=None):
super().__init__(algo_class)
self.lr_decay = None
def training(self, lr_decay=None, **kwargs):
self.lr_decay = lr_decay
return super().training(**kwargs)
config = (
CustomPPOConfig()
.framework("torch")
.environment("CartPole-v1")
.training(lr_decay=0.01, lr=3e-3)
.callbacks(LRDecayCallback)
)
algo = config.build()
for i in range(10):
policy = algo.get_policy()
cur_lr = policy.cur_lr
print(f"Iteration {i}: lr = {cur_lr}")
algo.train()
Once this is done, you can apply regular Tuner.fit() operation to this new config with tune search spaces for hyper-parameter tuning.
Thankyou very much!! Unfortunately I am not experienced with torch and I am more experienced with tf2. I guess I’ll try work with torch or maybe create something similar with tf2.