Negative entropy with PPO and continuous action space

ema-pe · June 10, 2025, 3:48pm

1. Severity of the issue: (select one)

Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

Ray version: 2.40
Python version: 3.12.3
OS: Ubuntu 24.04.2 LTS

3. What happened vs. what you expected:

I’m running an experiment with PPO in a custom environment with a continuous action space, which is actually a Dirichlet distribution. After running the experiment, I plotted some training statistics from the result.json file, particularly the entropy, while entropy_coeff was set to zero by default. If I understand correctly, the entropy should always be nonnegative. However, I always got negative values in the range of [0, -8]. What could be the cause? Is this normal for a continuous action space, or is there a problem with my experiment (maybe a hyperparameter issue)?

ema-pe · June 12, 2025, 2:14pm

Hello, I solved the issue: it was not either a Ray or PyTorch bug, but I was wrong about the entropy for a continuous probability distribution. For a continuous distribution, such as a Dirichlet distribution, the entropy is actually differential entropy, which can be negative.

Topic		Replies	Views
PPO entropy not decreasing in Ray=1.11.0 as Ray=1.2.0? RLlib	8	1217	January 9, 2023
Incredibly large policy entropy RLlib	3	387	November 13, 2021
Continuous action space Configure Algorithm, Training, Evaluation, Scaling	2	95	July 29, 2024
Implementing Dirichlet distribution RLlib	5	795	March 12, 2022
Output of PPO with discrete actions RLlib	4	1201	December 15, 2022

Negative entropy with PPO and continuous action space

Related topics