Entropy value in IMPALA

Hello,

I was wondering why entropy value is so high while training using IMPALA. Below is an example of the entropy values in IMPALA (blue) vs PPO (orange), and even on CartPole-v0, which only have an action space of size 2, entropy is above 250 on IMPALA.

entropy

I am using ray==1.2, tensorflow==2.3, with default hyperparameters on both PPO and IMPALA.

I would appreciate any information about this. Thanks a lot!

Hey @Fabien-Couthouis , interesting observation. This cannot be correct, indeed :).
The entropies should be the same (especially for cartpole after so many timesteps!). Will take a look.

1 Like

Found it: For IMPALA, for some reason, we report the sum of all entropies over the batch (size 500 by default). For PPO, we report the mean.
I’ll change IMPALA to also report the mean, instead. …

Well it was quick!
Thanks a lot @sven1977.

Here is the PR: [RLlib] Discussion 1709: IMPALA (tf and torch) reports sum of entropy (over batch) in stats. Should report mean instead. by sven1977 · Pull Request #15290 · ray-project/ray · GitHub

I think the same fix (report the mean instead of the sum) should also apply on pi_loss and vf_loss, as done in PPO, because the sum is a bit confusing.
Do you agree?

IMPALA (blue) vs PPO (orange)
vf_loss
(same with pi loss)

You are right. Could you do a PR with a fix for this? You can use the above PR as a template.

The pull request can be found here: [RLlib] Discussion 1709: IMPALA (tf and torch) reports sum of losses (over batch) in stats. Should report mean instead. by Fabien-Couthouis · Pull Request #15427 · ray-project/ray · GitHub

1 Like

Merged :slight_smile:
Thanks for this quick fix @Fabien-Couthouis !