Independent gradient update for each loss

51616 · March 5, 2021, 2:57pm

As I understand, RLlib computes all the losses at once and applies the gradient from the accumulated loss (e.g., critic + actor loss). I wonder if I could separate this process into multiple-gradient updates. For example, calculating the critic loss then apply the gradient "before" calculating the actor loss.

sven1977 · March 12, 2021, 8:16pm

For tf, yes, you can specify a custom gradient function for your policy (but you would have to re-“build” a new policy class via rllib/policy/tf_policy_template/build_tf_policy).
Then specify the gradients_fn and/or apply_gradients_fn.

For torch, I don’t think we unified this behavior. You can only modify once calculated gradients, but the loss+grad-calc+grad-apply order is fixed. We should flexibilize it here as well, though.

51616 · March 13, 2021, 9:04am

Thank you for your reply
I went with overriding TorchPolicy though just like what QMIX code does since I prefer torch. Basically, overriding the learn_on_batch function and control the learning flow myself.

Topic		Replies	Views
How to use own optimizer for custom_loss_model example RLlib	0	472	April 12, 2021
[RLlib] Pytorch multiple optimizers support RLlib	1	589	January 4, 2023
Call order for `loss_fn` and `custom_loss` RLlib	1	280	August 12, 2021
Creating saliency maps / activation maximization with trained policy RLlib	3	362	October 6, 2022
How could I implement gradient accumulation? RLlib	0	20	June 17, 2025

Independent gradient update for each loss

Related topics