PPO lr_schedule not working

Hello, I just want to understand how exactly the lr_schedule should work? I specify the learning_schedule in this way:

"lr": 0.00001
"lr_schedule": [

but as you can see in the image, the learning rate schedule is not working. Does the 50 and 100 values correspond to the “step”?

I tried to create a linear policy, but seems the LearningRateSchedule class only accept PiecewiseSchedule schedule, that is fine for the linear I guess, but not for a polynomial

lr_schedule = PolynomialSchedule(args.lrate[2], args.lrate[1], args.framework, args.lrate[0], 1.0)

class LearningRateSchedule:
    """Mixin for TFPolicy that adds a learning rate schedule."""

    def __init__(self, lr, lr_schedule):
        self._lr_schedule = None
        if lr_schedule is None:
            self.cur_lr = tf1.get_variable(
                "lr", initializer=lr, trainable=False)
            self._lr_schedule = PiecewiseSchedule(
                lr_schedule, outside_value=lr_schedule[-1][-1], framework=None)
            self.cur_lr = tf1.get_variable(
                "lr", initializer=self._lr_schedule.value(0), trainable=False)
            if self.framework == "tf":
                self._lr_placeholder = tf1.placeholder(
                    dtype=tf.float32, name="lr")
                self._lr_update = self.cur_lr.assign(
                    self._lr_placeholder, read_value=False)

I did some experiments to understand, and I think I got it, please correct me if I got this wrong.
It’s not that I specify the number of iterations (50 to 100), it’s that I need to specify the batch_size*number of iterations correct?

Exactly. This schedule is based on timesteps, not iterations.
It’s a common confusion point for folks.
Sorry you had to figure this out yourself.

1 Like

Thanks for your reply @gjoliver !