PPO with Critic and no GAE

Hi everyone,

For the PPO algorithm, in the case of use_gae=False and use_critic=True it seems to me that no value function will be learnt. I may be missing something but looking at the code here I cannot see the branch corresponding to my case, and then we may have mean_vf_loss = tf.constant(0.0).

I think in the PPO paper they mention to use GAE, but having the option can be misleading.

Thanks for your help / confirmation that I did not miss anything.

Yes, this looks like a bug in PPO. We’ll probably just have to change the if condition from

if policy.config["use_gae"]


if policy.config["use_critic"]