Hi everyone,
For the PPO algorithm, in the case of use_gae=False
and use_critic=True
it seems to me that no value function will be learnt. I may be missing something but looking at the code here I cannot see the branch corresponding to my case, and then we may have mean_vf_loss = tf.constant(0.0)
.
I think in the PPO paper they mention to use GAE, but having the option can be misleading.
Thanks for your help / confirmation that I did not miss anything.