Based on the code you mentioned, I have a very simple idea to add stddev computation lines into following part:
reward_stddev = np.std(episode_rewards)
reward_stddev = float("nan")
and then the return dict of this function should also be added with
episode_rewards_stddev = reward_stddev,
One thing I want to ask is about tensorboard.
I checked the code in tune/logger and train/callbacks/logging and
the data should be added into tensorboard if it is the instance of one of predefined data classes.
But I didn’t see like policy_reward_min item in my previous experiments, so why was that?
Is it because the data corresonding to policy_reward_min does not belong to limited data classes, or there is other processing that stops these items being added into tensorboard?