How could I implement gradient accumulation?

1. Severity of the issue: (select one)
High: Completely blocks me.

For my own research as well as recreation I would like to work with gradient accumulation. From my research it does not appear that ray/rllib supports this in the respective framework learners.

Among others I have compute_gradients and prevent optim.zero_grad for my accumulation steps and likely return an empty dict in these steps. However I am not sure if that is sufficient and what side-effects might appear.

I’ll update this post with my findings - however, I would appreciate any further input that could help me reach a implementation supporting gradient accumulation and avoid pitfalls with RLlib’s framework.