I’m running A3C with RLlib. My policy is a discrete one with 8 actions. The maximum possible entropy should be $\log 8$ which is 2. However, I read from the tensorboard that the value is about 30.
in the code the entropy is calculated each iteration here in the a3c_torch_policy.py. The implementation calls the entropy() method from the used dist_class which is an ActionDistribution object.