How could I implement gradient accumulation?

Daraan · June 17, 2025, 1:04pm

1. Severity of the issue: (select one)
High: Completely blocks me.

For my own research as well as recreation I would like to work with gradient accumulation. From my research it does not appear that ray/rllib supports this in the respective framework learners.

Among others I have compute_gradients and prevent optim.zero_grad for my accumulation steps and likely return an empty dict in these steps. However I am not sure if that is sufficient and what side-effects might appear.

I’ll update this post with my findings - however, I would appreciate any further input that could help me reach a implementation supporting gradient accumulation and avoid pitfalls with RLlib’s framework.

github.com/ray-project/ray

rllib/core/learner/torch/torch_learner.py

eed7e024d


      
          def compute_gradients(
              self, loss_per_module: Dict[ModuleID, TensorType], **kwargs
          ) -> ParamDict:
              for optim in self._optimizer_parameters:
                  # `set_to_none=True` is a faster way to zero out the gradients.
                  optim.zero_grad(set_to_none=True)

Topic		Replies	Views
How does rllib parallelise gradient computation and updating? Configure Algorithm, Training, Evaluation, Scaling	0	274	September 27, 2023
Creating saliency maps / activation maximization with trained policy RLlib	3	361	October 6, 2022
Handling complex computations in Env RLlib	4	413	January 2, 2022
Independent gradient update for each loss RLlib	2	316	March 13, 2021
Tensor-based custom robotics environments vs Gym/NumPy for use with Ray RLlib — recommendations? RLlib	0	14	June 18, 2025

How could I implement gradient accumulation?

Related topics