I was wondering why entropy value is so high while training using IMPALA. Below is an example of the entropy values in IMPALA (blue) vs PPO (orange), and even on CartPole-v0, which only have an action space of size 2, entropy is above 250 on IMPALA.
I am using ray==1.2, tensorflow==2.3, with default hyperparameters on both PPO and IMPALA.
I would appreciate any information about this. Thanks a lot!