Incredibly large policy entropy


I’m running A3C with RLlib. My policy is a discrete one with 8 actions. The maximum possible entropy should be $\log 8$ which is 2. However, I read from the tensorboard that the value is about 30.

I also checked the source code and found the displayed value is actually an averaged value. ray/ at master · ray-project/ray · GitHub

Are there any bugs with the implemented entropy calculation? Or did I just miss something?


Hey @gjoliver @sven1977 @avnishn any ideas here?

Hi @yunfanjiang ,

in the code the entropy is calculated each iteration here in the The implementation calls the entropy() method from the used dist_class which is an ActionDistribution object.

In the model catalog the action distributions get assigned, given the action space. Given that discrete actions are used the TorchCategorical action distribution is chosen. Therein the torch.distributions.categorical.Categorical is assigned to self.dist (so from this dist object the entropy() method will be called) and from this distribution the entropy is calculated as:

def entropy(self):
        min_real = torch.finfo(self.logits.dtype).min
        logits = torch.clamp(self.logits, min=min_real)
        p_log_p = logits * self.probs
        return -p_log_p.sum(-1)

Hope this sheds some light on your results

Thank you! That really helps.

1 Like