Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max)

VisionZUS29 · June 11, 2024, 8:45pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

I want to monitor the episode reward throughout training, rather than seeing min, max, mean plots. I used this callback and my tensorboard is updated with the custom metric, but it automatically creates mean, min, max instead of just episode reward itself.

ray/tune/custom_metrics/individual_episode_reward_mean
ray/tune/custom_metrics/individual_episode_reward_min
ray/tune/custom_metrics/individual_episode_reward_max

Printing episode.total_reward to terminal during training gives a singular scalar value per episode as I would expect.

    class CustomCallbacks(DefaultCallbacks):
        def on_episode_end(self, *, worker, base_env, policies, episode, **kwargs):
            # Log the total reward for the episode\
            print("Episode Reward: ", episode.total_reward)
            episode.custom_metrics["individual_episode_reward"] = episode.total_reward

I want basically what is in ray/tune/hist_stats/episode_reward but as a scalar instead of histogram plot.

mannyv · June 12, 2024, 1:30am

Hi @VisionZUS29,

The reason it reports a mean, min, max is because during one iteration it is potentially sampling transition s from multiple episodes. If multiple episodes complete then there will be multiple episode returns and the metrics report the min, max, mean of all the episodes. Hist stats shows you a histogram of every episode return obtained in an iteration.

VisionZUS29 · June 12, 2024, 2:07am

@mannyv What do you mean by iteration, Is it sampling based on whether the config has collections as complete_episodes vs truncate_episodes? I guess then i am looking to be able to plot reward vs episode rather than steps? My env terminates after a set amount of steps so each episode has the same amount of steps. I still think it should make sense to have a singular episode reward plot.

mannyv · June 12, 2024, 2:25am

Perhaps this thread will be helpful:

VisionZUS29 · June 12, 2024, 3:43am

@mannyv
I get the gist of what you are saying. I understand the difference in mean, min, max.

So why can’t I plot every episode reward for every completed episode that occurred throughout the training? If it’s already being collected and sorted per train_batch_size?

narevau · June 24, 2024, 4:06pm

Hi @VisionZUS29,

not quite sure if this is what you want, but i think you can set it in “episode.hist_data” like this:

def on_episode_end(self, *, worker, base_env, policies, episode, **kwargs):
            # Log the total reward for the episode\
            print("Episode Reward: ", episode.total_reward)
            episode.hist_data["individual_episode_reward"] = episode.total_reward

It should then keep track of every reward seen in each episode in the result of: result = algo.evaluate(), somewhere under hist_data.

Topic		Replies	Views
How to obtain single episode reward? RLlib	6	1466	March 19, 2024
Add episode reward variance into matrix and tensorboard RLlib	4	541	February 15, 2022
How rllib train log the reward on tensorboard? RLlib	1	537	March 25, 2022
Mean reward per agent in MARL RLlib	11	1118	January 12, 2023
Looking for the tensorboard source code part RLlib	5	595	May 4, 2022

Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max)

Related topics