1. Severity of the issue: (select one)
High: Completely blocks me.
For my own research as well as recreation I would like to work with gradient accumulation. From my research it does not appear that ray/rllib supports this in the respective framework learners.
Among others I have compute_gradients and prevent optim.zero_grad for my accumulation steps and likely return an empty dict in these steps. However I am not sure if that is sufficient and what side-effects might appear.
I’ll update this post with my findings - however, I would appreciate any further input that could help me reach a implementation supporting gradient accumulation and avoid pitfalls with RLlib’s framework.
Hi @Daraan , thanks for the question and for sharing your code here. You made everything as intended: create your own Learner and then override the compute_gradients - great work!
I just flew over it and something was suspicious to me: it looks as if the zero_grad is called in the last accumulation step at which updating is happening. As a result only the last gradient is used while the history of accumulated gradients is removed.