Obtain negative explained variance when training a PPO agent

Roller44 · December 26, 2021, 2:33am

I am training a PPO agent (using RLlib) and found out that no matter what hyperparameter I set, the explained variance (i.e., vf_explained_var in the following figure) always be a negative value that is closed to zero.

exlained_variance

I learned from this post that the explained variance should be as close to 1.0 as possible. However, I found that even with a negative explained variance, my agent is still able to converge, as shown in the figure below.

reward

So my questions are:

Is it normal to have a negative and near-zero explained variance? If it is not, what can I do to improve it to 1?
My agent always converges to a local optimum policy, is it because of the negative explained variance?

Topic		Replies	Views
Unexpected dramatic drop in reward RLlib	8	963	November 13, 2023
PPO Training Error: NaN Values in Gradients and Near-Zero Loss RLlib	6	249	September 3, 2024
Different learning rates for different agents RLlib	0	140	October 1, 2023
Unable to replicate original PPO performance RLlib	0	173	May 10, 2024
Entropy Regularization in PG? RLlib	9	878	September 17, 2022

Obtain negative explained variance when training a PPO agent

Related topics