Entropy of policy network's output

Lauritowal · March 8, 2021, 8:50am

Hi!
How can I display the entropy of the policy network’s outputs,
relative to the maximum possible entropy in Tensorboard?

Ameer_Haj_Ali · March 10, 2021, 9:35pm

CC @rliaw . Can you please help?

rliaw · March 10, 2021, 9:53pm

@sven1977 is the right person to answer this question!

sven1977 · March 11, 2021, 7:40am

Hey @Lauritowal , you can specify a custom callback to calculate this property and store it for TB displaying.

Like so:

Check out the example script rllib/examples/custom_metrics_and_callbacks.py
Override on_postprocess_trajectory and take a look at the postprocessed_batch arg therein. You should be able to see “ACTION_PROBS” and “ACTION_LOGP” as keys. You could use them to calculate the entropy.
Store that entropy inside e.g. episode.custom_metrics["policy_entropy"].
You should see these values now in TB.

Topic		Replies	Views
Use Policy_Trainer with TensorBoard RLlib	33	2329	November 13, 2021
Incredibly large policy entropy RLlib	3	354	November 13, 2021
Looking for the tensorboard source code part RLlib	5	594	May 4, 2022
Reporting Custom Metrics From Policy_Clients RLlib	0	258	November 12, 2021
Cumulative reward chart RLlib	3	515	July 26, 2021