PPO with Critic and no GAE

tibogiss · May 3, 2021, 7:08am

Hi everyone,

For the PPO algorithm, in the case of use_gae=False and use_critic=True it seems to me that no value function will be learnt. I may be missing something but looking at the code here I cannot see the branch corresponding to my case, and then we may have mean_vf_loss = tf.constant(0.0).

I think in the PPO paper they mention to use GAE, but having the option can be misleading.

Thanks for your help / confirmation that I did not miss anything.

sven1977 · May 3, 2021, 7:42am

Yes, this looks like a bug in PPO. We’ll probably just have to change the if condition from

if policy.config["use_gae"]

to:

if policy.config["use_critic"]

Topic		Replies	Views
Custom Critic (Value_function) in PPO RLlib	3	995	March 11, 2021
Adapted GAE formula ==> PPO algorithm used to solve problems modeled as a Semi-Markov Decision Process RLlib	1	316	November 17, 2021
RuntimeWarning: Mean of empty slice with TensorFlow multi-agent PPO RLlib	0	381	July 2, 2021
The role of the discount factor gamma in policy gradient algorithms RLlib	2	525	September 30, 2021
KeyError: 'advantages' when training PPO with custom model in RLlib RLlib	7	115	March 27, 2025

PPO with Critic and no GAE

Related topics