Meaning of episode_reward_mean

carlorop · October 18, 2021, 10:17am

What is the meaning of the episode_reward_mean metric? Is it the sum of the reward obtained in each time step of the episode? What is the difference between episode_reward_mean and episode_reward_min?

Lars_Simon_Zehnder · October 18, 2021, 12:16pm

Hi @carlorop,

episode_reward_mean is the mean over all episodes played so far, so this is updated after each episode. episode_reward_min is then the corresponding minimum over all episodes played so far. Hope this helps.

carlorop · October 18, 2021, 1:03pm

Thanks for your reply @Lars_Simon_Zehnder

Is it computed over all the previous episodes or is it computing over the last n episodes? I am asking because sometimes the episode_reward_min increases

mannyv · October 18, 2021, 1:26pm

Hi @carlorop,
RLLIB collects a number of metrics. One of those is the episode_reward. When creating the summary it will compute and store the mean, min , and max of those metrics.
The metrics summarize all of the collected values during the previous iteration. What defines an iteration varies. If you are using PPO, for example, the episode reward will contain the rewards obtained for every completed episode that occurred when collecting the last train_batch_size steps. If no episodes returned done which is possible for some really long environments, then that metric would be empty.

Roller44 · May 10, 2022, 1:42pm

If I want to obtain the reward in the latest episode, what setting should I change?

Lars_Simon_Zehnder · May 11, 2022, 8:10pm

Hi @Roller44 You could create a custom callback that writes this value away. Another method (but more time and disk consuming) is to use RLlib’s Offline API that writes away all timestep’s states and rewards and then extract of these data the last episode’s total reward.

config["offline"] = "my/data/path"

JeanT · January 9, 2023, 7:41pm

Hi,
Despite many answers above, I am still not sure about the meaning of episode_reward in episode_reward_mean. Maybe because English is not my mother language, sorry about that.
A/ Does episode_reward mean RL return (i.e. sum of discounted rewards in an episode)? If yes, how much is the discount value?
B/ Or does it mean sum of rewards in an episode (i.e. same as A iif discount value = 1.0)?
C/ Or does it mean a single reward within any episode?
I have no problem with mean terminology.
Thanks

mannyv · January 10, 2023, 12:09am

Hi @JeanT,

The episode reward is the sum of all the rewards for each timestep in an episode. Yes, you could think of it as discount=1.0.

The mean is taken over the number of episodes not timesteps. The number of episodes is the number of new episodes sampled during the rollout phase or evaluation if it is an evaluation metric.

XavierM · January 10, 2023, 3:15pm

I concur with @mannyv (i.e. B solution). As an additional explanation, in a multi-agent RL configuration, the reported episode_reward_mean in json_object is the sum of the episode_reward_mean obtained by RL each agent.

fardinabbasi · September 21, 2023, 11:54am

What other metrics are available by default besides episode_reward_mean ? For example, is there any metric for step reward?

Lars_Simon_Zehnder · September 21, 2023, 2:36pm

@fardinabbasi there is episode_reward_min and episode_reward_maxare available

Topic		Replies	Views
[RLlib, Tune, PPO] episode_reward_mean based on new episodes for each iteration Configure Algorithm, Training, Evaluation, Scaling	1	28	November 25, 2024
How rllib train log the reward on tensorboard? RLlib	1	521	March 25, 2022
Why the episode reward mean is always the same number for a while(about 10 iters)? RLlib	1	727	May 4, 2021
Any other metric other than "episode_reward_mean" Configure Algorithm, Training, Evaluation, Scaling	3	56	October 16, 2024
How to obtain single episode reward? RLlib	6	1428	March 19, 2024

Meaning of episode_reward_mean

Related topics