1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.
2. Environment:
- Ray version: 2.40
- Python version: 3.12.3
- OS: Ubuntu 24.04.2 LTS
3. What happened vs. what you expected:
I’m running an experiment with PPO in a custom environment with a continuous action space, which is actually a Dirichlet distribution. After running the experiment, I plotted some training statistics from the result.json file, particularly the entropy, while entropy_coeff was set to zero by default. If I understand correctly, the entropy should always be nonnegative. However, I always got negative values in the range of [0, -8]. What could be the cause? Is this normal for a continuous action space, or is there a problem with my experiment (maybe a hyperparameter issue)?