Add episode reward variance into matrix and tensorboard

fengxiaoxu96 · February 13, 2022, 6:40am

Hi,

I am using RLlib for training my own robot agent.
I want to ask how to obtain the variance of reward over a period, maybe as same as the period of computation of episode_reward_mean.

It will be better If the variance can be shown in tensorboard so that I can postprocess and plot it after training.

Appreciate any idea about this!

gjoliver · February 14, 2022, 8:34pm

This is a good place to start looking, maybe you can contribute a patch for episode_reward_stddev to RLlib?

github.com

ray-project/ray/blob/master/rllib/evaluation/metrics.py#L229-L231


      
              hist_stats["policy_{}_reward".format(policy_id)] = rewards
          
          for k, v_list in custom_metrics.copy().items():

fengxiaoxu96 · February 15, 2022, 3:34am

Hi, gjoliver.

Thanks a lot for your helpful information.

Based on the code you mentioned, I have a very simple idea to add stddev computation lines into following part:

reward_stddev = np.std(episode_rewards)

reward_stddev = float("nan")

github.com

ray-project/ray/blob/8f9e0d7f6bbe4d0b0610826ad8cd22922397ffc0/rllib/evaluation/metrics.py#L173-L180

      
        
            if episode_rewards:
                min_reward = min(episode_rewards)
                max_reward = max(episode_rewards)
                avg_reward = np.mean(episode_rewards)
            else:
                min_reward = float("nan")
                max_reward = float("nan")
                avg_reward = float("nan")

and then the return dict of this function should also be added with

episode_rewards_stddev = reward_stddev,

One thing I want to ask is about tensorboard.
I checked the code in tune/logger and train/callbacks/logging and
the data should be added into tensorboard if it is the instance of one of predefined data classes.

But I didn’t see like policy_reward_min item in my previous experiments, so why was that?
Is it because the data corresonding to policy_reward_min does not belong to limited data classes, or there is other processing that stops these items being added into tensorboard?

gjoliver · February 15, 2022, 5:32am

I double checked with the team, as long as your metrics are part of the result dict that is returned to Tune, TBXLogger will automatically log it to the tensorboard output file:

github.com

ray-project/ray/blob/master/python/ray/tune/logger.py#L183

      
        
            def _init(self):
                try:
                    from tensorboardX import SummaryWriter
                except ImportError:
                    if log_once("tbx-install"):
                        logger.info('pip install "ray[tune]" to see TensorBoard files.')
                    raise
                self._file_writer = SummaryWriter(self.logdir, flush_secs=30)
                self.last_result = None
            
            
def on_result(self, result: Dict):
                step = result.get(TIMESTEPS_TOTAL) or result[TRAINING_ITERATION]
            
            
    tmp = result.copy()
                for k in ["config", "pid", "timestamp", TIME_TOTAL_S, TRAINING_ITERATION]:
                    if k in tmp:
                        del tmp[k]  # not useful to log these
            
            
    flat_result = flatten_dict(tmp, delimiter="/")
                path = ["ray", "tune"]
                valid_result = {}

fengxiaoxu96 · February 15, 2022, 6:11am

Thank you for this reply.
I will check my own code for that.

Topic		Replies	Views
How rllib train log the reward on tensorboard? RLlib	1	531	March 25, 2022
Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max) RLlib	5	259	June 24, 2024
Mean reward per agent in MARL RLlib	11	1110	January 12, 2023
ICM - Curiosity Reward Scale RLlib	3	510	May 16, 2022
Reporting Custom Metrics From Policy_Clients RLlib	0	258	November 12, 2021

Add episode reward variance into matrix and tensorboard

Related topics