How severe does this issue affect your experience of using Ray?
Low: It annoys or frustrates me for a moment.
I want to monitor the episode reward throughout training, rather than seeing min, max, mean plots. I used this callback and my tensorboard is updated with the custom metric, but it automatically creates mean, min, max instead of just episode reward itself.
Printing episode.total_reward to terminal during training gives a singular scalar value per episode as I would expect.
class CustomCallbacks(DefaultCallbacks):
def on_episode_end(self, *, worker, base_env, policies, episode, **kwargs):
# Log the total reward for the episode\
print("Episode Reward: ", episode.total_reward)
episode.custom_metrics["individual_episode_reward"] = episode.total_reward
I want basically what is in ray/tune/hist_stats/episode_reward but as a scalar instead of histogram plot.
The reason it reports a mean, min, max is because during one iteration it is potentially sampling transition s from multiple episodes. If multiple episodes complete then there will be multiple episode returns and the metrics report the min, max, mean of all the episodes. Hist stats shows you a histogram of every episode return obtained in an iteration.
@mannyv What do you mean by iteration, Is it sampling based on whether the config has collections as complete_episodes vs truncate_episodes? I guess then i am looking to be able to plot reward vs episode rather than steps? My env terminates after a set amount of steps so each episode has the same amount of steps. I still think it should make sense to have a singular episode reward plot.
@mannyv
I get the gist of what you are saying. I understand the difference in mean, min, max.
So why can’t I plot every episode reward for every completed episode that occurred throughout the training? If it’s already being collected and sorted per train_batch_size?