Obtain negative explained variance when training a PPO agent

I am training a PPO agent (using RLlib) and found out that no matter what hyperparameter I set, the explained variance (i.e., vf_explained_var in the following figure) always be a negative value that is closed to zero.


I learned from this post that the explained variance should be as close to 1.0 as possible. However, I found that even with a negative explained variance, my agent is still able to converge, as shown in the figure below.


So my questions are:

  1. Is it normal to have a negative and near-zero explained variance? If it is not, what can I do to improve it to 1?

  2. My agent always converges to a local optimum policy, is it because of the negative explained variance?