Add episode reward variance into matrix and tensorboard


I am using RLlib for training my own robot agent.
I want to ask how to obtain the variance of reward over a period, maybe as same as the period of computation of episode_reward_mean.

It will be better If the variance can be shown in tensorboard so that I can postprocess and plot it after training.

Appreciate any idea about this!

This is a good place to start looking, maybe you can contribute a patch for episode_reward_stddev to RLlib? :slight_smile:

Hi, gjoliver.

Thanks a lot for your helpful information.

Based on the code you mentioned, I have a very simple idea to add stddev computation lines into following part:

reward_stddev = np.std(episode_rewards)
reward_stddev = float("nan")

and then the return dict of this function should also be added with

episode_rewards_stddev = reward_stddev,

One thing I want to ask is about tensorboard.
I checked the code in tune/logger and train/callbacks/logging and
the data should be added into tensorboard if it is the instance of one of predefined data classes.

But I didn’t see like policy_reward_min item in my previous experiments, so why was that?
Is it because the data corresonding to policy_reward_min does not belong to limited data classes, or there is other processing that stops these items being added into tensorboard?

I double checked with the team, as long as your metrics are part of the result dict that is returned to Tune, TBXLogger will automatically log it to the tensorboard output file:

Thank you for this reply.
I will check my own code for that.