Two quick questions about GAE's implementation in RLlib

MCW_Lad · July 4, 2025, 7:43am

Hello, all. I was working on a project and noticed an out of memory error that threw in general_advantage_estimation.py. After looking at the code, I found that, on line 96, the entire batch is fed into the encoder and value head at once. Further, I noticed that gradients are calculated in the __call__ method, despite (at least, as far as I can tell) not needing then on either of its products (ADVANTAGES and VALUE_TARGETS).

Is there a reason we calculate gradients here, and pass the observations to the value head in one big batch?

I modified the file to prevent gradient calculation and batch inputs to the value head. I’ve tested these changes and everything seems to work. I’d be happy to clean up my code and submit a PR if it’d be useful.

Topic		Replies	Views
How to concat rollout batches before computing GAE? RLlib	4	292	July 7, 2021
Behavioural Cloning Algo RLlib	6	1715	May 24, 2021
PPO with Critic and no GAE RLlib	1	454	May 3, 2021
Independent gradient update for each loss RLlib	2	319	March 13, 2021
RNN L2 weights regularization RLlib	41	2065	July 5, 2021

Two quick questions about GAE's implementation in RLlib

Related topics