Negative entropy with PPO and continuous action space

1. Severity of the issue: (select one)

Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

  • Ray version: 2.40
  • Python version: 3.12.3
  • OS: Ubuntu 24.04.2 LTS

3. What happened vs. what you expected:

I’m running an experiment with PPO in a custom environment with a continuous action space, which is actually a Dirichlet distribution. After running the experiment, I plotted some training statistics from the result.json file, particularly the entropy, while entropy_coeff was set to zero by default. If I understand correctly, the entropy should always be nonnegative. However, I always got negative values in the range of [0, -8]. What could be the cause? Is this normal for a continuous action space, or is there a problem with my experiment (maybe a hyperparameter issue)?

Hello, I solved the issue: it was not either a Ray or PyTorch bug, but I was wrong about the entropy for a continuous probability distribution. For a continuous distribution, such as a Dirichlet distribution, the entropy is actually differential entropy, which can be negative.