Tensorboard Results

Hello, I am using the LGSVL environment which uses the openai gym interface. I have a question regarding 2 metrics: “action0_mean” and “entropy”. As you can see in the attached screenshot both plots are identical. How is this possible? Furthermore, the action is also bounded between -1 and 1 but the plot exceeds these bounds. I appreciate any help.

1 Like

Hey @Ouissam , great question! :slight_smile: The model_action0_mean is probably the raw model’s output. Note that the model - in your con. actions case - will output a tensor whose first half is interpreted as the mean and the second half as the log(std) values of a diagonal Gaussian distribution. That’s why value may be beyond -1 and 1. Only after that does RLlib do action “unsquashing” and scale these values into the env’s bounds. On the entropy being the same: That’s indeed strange and I would not expect this. What algo are you using?

@Ouissam @sven1977 Regarding the actions not being -1 and 1, Sven is right, model_action0_mean are the action logits before being squahsed (tanh).

Regarding the plots being the same, I think action being equal to entropy is an artifact of your environment. I tried Atari and Mujoco on DQN,PPO, and SAC and these metrics differ.

Thank you for the replies @sven1977 and @michaelzhiluo! I am using the APPO algorithm and as environment the LGSVL environment. Is there a possibility for me to plot these values myself, or see where they come from?

Could you find out something new regarding this problem, or do you have any suggestions for plotting the right entropy?