Incredibly large policy entropy

yunfanjiang · November 8, 2021, 12:43am

Hi!

I’m running A3C with RLlib. My policy is a discrete one with 8 actions. The maximum possible entropy should be $\log 8$ which is 2. However, I read from the tensorboard that the value is about 30.

I also checked the source code and found the displayed value is actually an averaged value. ray/a3c_torch_policy.py at master · ray-project/ray · GitHub

Are there any bugs with the implemented entropy calculation? Or did I just miss something?

Thanks!

amogkam · November 11, 2021, 6:07pm

Hey @gjoliver @sven1977 @avnishn any ideas here?

Lars_Simon_Zehnder · November 13, 2021, 12:05pm

Hi @yunfanjiang ,

in the code the entropy is calculated each iteration here in the a3c_torch_policy.py. The implementation calls the entropy() method from the used dist_class which is an ActionDistribution object.

In the model catalog the action distributions get assigned, given the action space. Given that discrete actions are used the TorchCategorical action distribution is chosen. Therein the torch.distributions.categorical.Categorical is assigned to self.dist (so from this dist object the entropy() method will be called) and from this distribution the entropy is calculated as:

def entropy(self):
        min_real = torch.finfo(self.logits.dtype).min
        logits = torch.clamp(self.logits, min=min_real)
        p_log_p = logits * self.probs
        return -p_log_p.sum(-1)

Hope this sheds some light on your results

yunfanjiang · November 13, 2021, 5:22pm

Thank you! That really helps.

Topic		Replies	Views
Target entropy in discrete SAC Implementation RLlib	1	584	September 20, 2023
Rllib is auto adjusting my action distribution RLlib	4	316	May 26, 2022
TorchMultiCategorical with logits calculated in the constructor RLlib	6	484	October 6, 2021
Entropy of policy network's output RLlib	3	589	March 11, 2021
How are action computed from action_dist_inputs? RLlib	2	323	December 12, 2023

Incredibly large policy entropy

Related topics