1. Severity of the issue: (select one)
High: Completely blocks me.
For my own research as well as recreation I would like to work with gradient accumulation. From my research it does not appear that ray/rllib supports this in the respective framework learners.
Among others I have compute_gradients
and prevent optim.zero_grad
for my accumulation steps and likely return an empty dict in these steps. However I am not sure if that is sufficient and what side-effects might appear.
I’ll update this post with my findings - however, I would appreciate any further input that could help me reach a implementation supporting gradient accumulation and avoid pitfalls with RLlib’s framework.