Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max)

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

I want to monitor the episode reward throughout training, rather than seeing min, max, mean plots. I used this callback and my tensorboard is updated with the custom metric, but it automatically creates mean, min, max instead of just episode reward itself.

ray/tune/custom_metrics/individual_episode_reward_mean
ray/tune/custom_metrics/individual_episode_reward_min
ray/tune/custom_metrics/individual_episode_reward_max

Printing episode.total_reward to terminal during training gives a singular scalar value per episode as I would expect.

    class CustomCallbacks(DefaultCallbacks):
        def on_episode_end(self, *, worker, base_env, policies, episode, **kwargs):
            # Log the total reward for the episode\
            print("Episode Reward: ", episode.total_reward)
            episode.custom_metrics["individual_episode_reward"] = episode.total_reward

I want basically what is in ray/tune/hist_stats/episode_reward but as a scalar instead of histogram plot.

Hi @VisionZUS29,

The reason it reports a mean, min, max is because during one iteration it is potentially sampling transition s from multiple episodes. If multiple episodes complete then there will be multiple episode returns and the metrics report the min, max, mean of all the episodes. Hist stats shows you a histogram of every episode return obtained in an iteration.

@mannyv What do you mean by iteration, Is it sampling based on whether the config has collections as complete_episodes vs truncate_episodes? I guess then i am looking to be able to plot reward vs episode rather than steps? My env terminates after a set amount of steps so each episode has the same amount of steps. I still think it should make sense to have a singular episode reward plot.

Perhaps this thread will be helpful:

@mannyv
I get the gist of what you are saying. I understand the difference in mean, min, max.

So why can’t I plot every episode reward for every completed episode that occurred throughout the training? If it’s already being collected and sorted per train_batch_size?