SAC: two different sets of learning rates?

1. Severity of the issue: (select one)
Low: Annoying but doesn’t hinder my work.

2. Environment:

  • Ray version: 2.47.1
  • Python version: 3.11
  • OS: Linux

In the source code of SAC there are two different sets of learning rates (actor, critic and alpha) defined. Even their initial values are different and the user can assign different values to them through the config. Why is that and which set is the actual learning rates being used? Is one of the sets deprecated?

The code below is copied from this link:
https://docs.ray.io/en/latest/_modules/ray/rllib/algorithms/sac/sac.html

self.optimization = {
            "actor_learning_rate": 3e-4,
            "critic_learning_rate": 3e-4,
            "entropy_learning_rate": 3e-4,
        }
self.actor_lr = 3e-5
self.critic_lr = 3e-4
self.alpha_lr = 3e-4

Hello! There was a RLLib migration so one is used for the new stack and one is the old stack. It is ultimately up to you which one you decide to use, but I suggest moving to the new stack if possible.

actor_learning_rate will only be used if you use the old stack.
actor_lr will only be used if you use the new stack.

I went to ask our engineering team, who also clarified that “our main learning rate attribute is named lr and so we wanted to adhere to this naming for other learning rates”.

Since the old stack is going to be deprecated later this year, I recommend using the new stack. The ambiguities should likely be gone by then too. Thanks for the question!