PPO lr_schedule not working

MRuiz · February 12, 2022, 10:27pm

Hello, I just want to understand how exactly the lr_schedule should work? I specify the learning_schedule in this way:

"lr": 0.00001
"lr_schedule": [
[50,0.00001],
[100,0.000001]
]

but as you can see in the image, the learning rate schedule is not working. Does the 50 and 100 values correspond to the “step”?
Thanks!

MRuiz · February 13, 2022, 10:05am

I tried to create a linear policy, but seems the LearningRateSchedule class only accept PiecewiseSchedule schedule, that is fine for the linear I guess, but not for a polynomial

lr_schedule = PolynomialSchedule(args.lrate[2], args.lrate[1], args.framework, args.lrate[0], 1.0)

class LearningRateSchedule:
    """Mixin for TFPolicy that adds a learning rate schedule."""

    @DeveloperAPI
    def __init__(self, lr, lr_schedule):
        self._lr_schedule = None
        if lr_schedule is None:
            self.cur_lr = tf1.get_variable(
                "lr", initializer=lr, trainable=False)
        else:
            self._lr_schedule = PiecewiseSchedule(
                lr_schedule, outside_value=lr_schedule[-1][-1], framework=None)
            self.cur_lr = tf1.get_variable(
                "lr", initializer=self._lr_schedule.value(0), trainable=False)
            if self.framework == "tf":
                self._lr_placeholder = tf1.placeholder(
                    dtype=tf.float32, name="lr")
                self._lr_update = self.cur_lr.assign(
                    self._lr_placeholder, read_value=False)

MRuiz · February 13, 2022, 11:21am

I did some experiments to understand, and I think I got it, please correct me if I got this wrong.
It’s not that I specify the number of iterations (50 to 100), it’s that I need to specify the batch_size*number of iterations correct?

gjoliver · February 14, 2022, 8:42pm

Exactly. This schedule is based on timesteps, not iterations.
It’s a common confusion point for folks.
Sorry you had to figure this out yourself.

MRuiz · February 15, 2022, 9:15am

Thanks for your reply @gjoliver !

Topic		Replies	Views
Support for annealing gamma RLlib	2	488	May 20, 2021
Learning rate annealing with tune.run() RLlib	6	1148	April 27, 2021
Change learning rete for DQN RLlib	6	494	February 25, 2022
Ray Tune: exponential learning rate schedule on HyperOpt Ray Tune	6	1072	May 28, 2023
PPO+LSTM consistently not working Configure Algorithm, Training, Evaluation, Scaling	0	158	May 9, 2024

PPO lr_schedule not working

Related topics