Meaning of episode_reward_mean

What is the meaning of the episode_reward_mean metric? Is it the sum of the reward obtained in each time step of the episode? What is the difference between episode_reward_mean and episode_reward_min?

Hi @carlorop,

episode_reward_mean is the mean over all episodes played so far, so this is updated after each episode. episode_reward_min is then the corresponding minimum over all episodes played so far. Hope this helps.

1 Like

Thanks for your reply @Lars_Simon_Zehnder

Is it computed over all the previous episodes or is it computing over the last n episodes? I am asking because sometimes the episode_reward_min increases

Hi @carlorop,
RLLIB collects a number of metrics. One of those is the episode_reward. When creating the summary it will compute and store the mean, min , and max of those metrics.
The metrics summarize all of the collected values during the previous iteration. What defines an iteration varies. If you are using PPO, for example, the episode reward will contain the rewards obtained for every completed episode that occurred when collecting the last train_batch_size steps. If no episodes returned done which is possible for some really long environments, then that metric would be empty.

2 Likes