How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
I just feel a little confused about the effect of learning rate hyperparameters in SAC config.
There is a common “lr” in common configs,
but there are also
in SAC-specific configs.
I just want to ask what the effect is for each of them.
Appreciate for any explanation!
SAC employs three optimizers for the three losses indicated by the learning rates’ names.
This way they can be tuned independently of each other
Thus, these learning rates specify the rate at which parameters of affected parts of the DNNs change. The exact specification of the loss can be found at ray.rllib.agents.sac.sac_tf_policy.sac_actor_critic_loss. How this loss is composed and what exactly happens there is quite complex and if you don’t want to go through the paper, I would like to suggest this video.
On a high level: Actor and Critic are for actor and critic like you would expect; Entropy loss is an additional SAC loss that stems from SAC’s objective of maximizing not only your rewards but also the entropy of your actions.
Thank you very much for answering.
Does it mean the “lr” in common config is useless in this case? So I just need to modify those three independently.
Yes, that is correct. Sorry for the late reply.