Entropy value in IMPALA

Fabien-Couthouis · April 13, 2021, 8:25pm

Hello,

I was wondering why entropy value is so high while training using IMPALA. Below is an example of the entropy values in IMPALA (blue) vs PPO (orange), and even on CartPole-v0, which only have an action space of size 2, entropy is above 250 on IMPALA.

entropy

I am using ray==1.2, tensorflow==2.3, with default hyperparameters on both PPO and IMPALA.

I would appreciate any information about this. Thanks a lot!

sven1977 · April 14, 2021, 6:51am

Hey @Fabien-Couthouis , interesting observation. This cannot be correct, indeed :).
The entropies should be the same (especially for cartpole after so many timesteps!). Will take a look.

sven1977 · April 14, 2021, 7:17am

Found it: For IMPALA, for some reason, we report the sum of all entropies over the batch (size 500 by default). For PPO, we report the mean.
I’ll change IMPALA to also report the mean, instead. …

Fabien-Couthouis · April 14, 2021, 7:26am

Well it was quick!
Thanks a lot @sven1977.

sven1977 · April 14, 2021, 7:32am

Here is the PR: [RLlib] Discussion 1709: IMPALA (tf and torch) reports sum of entropy (over batch) in stats. Should report mean instead. by sven1977 · Pull Request #15290 · ray-project/ray · GitHub

Fabien-Couthouis · April 14, 2021, 11:10am

I think the same fix (report the mean instead of the sum) should also apply on pi_loss and vf_loss, as done in PPO, because the sum is a bit confusing.
Do you agree?

IMPALA (blue) vs PPO (orange)
vf_loss
(same with pi loss)

sven1977 · April 20, 2021, 9:21am

You are right. Could you do a PR with a fix for this? You can use the above PR as a template.

github.com/ray-project/ray

[RLlib] Discussion 1709: IMPALA (tf and torch) reports sum of entropy (over batch) in stats. Should report mean instead.

Discussion 1709: IMPALA (tf and torch) reports sum of entropy (over batch) in stats. Should report mean instead. Also see this discussion here: https://discuss.ray.io/t/entropy-value-in-impala/1709 ## Why are these changes needed? ## Related issue number ## Checks - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

ray-project:master ← sven1977:discussion_1709_impala_entropy

opened 07:31AM - 14 Apr 21 UTC

sven1977

+6 -4

Fabien-Couthouis · April 20, 2021, 9:07pm

The pull request can be found here: [RLlib] Discussion 1709: IMPALA (tf and torch) reports sum of losses (over batch) in stats. Should report mean instead. by Fabien-Couthouis · Pull Request #15427 · ray-project/ray · GitHub

sven1977 · April 21, 2021, 8:59am

Merged
Thanks for this quick fix @Fabien-Couthouis !

Topic		Replies	Views
Impala Bugs and some other observations RLlib	9	1084	April 27, 2023
PPO entropy not decreasing in Ray=1.11.0 as Ray=1.2.0? RLlib	8	1165	January 9, 2023
APPO/IMPALA Logging RLlib	0	414	March 15, 2021
Increasing/decreasing exploration in rllib impala algorithm RLlib	1	225	April 13, 2023
IMPALA agent not working RLlib	1	324	January 9, 2023

Entropy value in IMPALA

Related topics