What is the meaning of the episode_reward_mean
metric? Is it the sum of the reward obtained in each time step of the episode? What is the difference between episode_reward_mean
and episode_reward_min
?
Hi @carlorop,
episode_reward_mean
is the mean over all episodes played so far, so this is updated after each episode. episode_reward_min
is then the corresponding minimum over all episodes played so far. Hope this helps.
Thanks for your reply @Lars_Simon_Zehnder
Is it computed over all the previous episodes or is it computing over the last n
episodes? I am asking because sometimes the episode_reward_min
increases
Hi @carlorop,
RLLIB collects a number of metrics. One of those is the episode_reward. When creating the summary it will compute and store the mean, min , and max of those metrics.
The metrics summarize all of the collected values during the previous iteration. What defines an iteration varies. If you are using PPO, for example, the episode reward will contain the rewards obtained for every completed episode that occurred when collecting the last train_batch_size
steps. If no episodes returned done which is possible for some really long environments, then that metric would be empty.
If I want to obtain the reward in the latest episode, what setting should I change?
Hi @Roller44 You could create a custom callback that writes this value away. Another method (but more time and disk consuming) is to use RLlib’s Offline API that writes away all timestep’s states and rewards and then extract of these data the last episode’s total reward.
config["offline"] = "my/data/path"
Hi,
Despite many answers above, I am still not sure about the meaning of episode_reward in episode_reward_mean. Maybe because English is not my mother language, sorry about that.
A/ Does episode_reward mean RL return (i.e. sum of discounted rewards in an episode)? If yes, how much is the discount value?
B/ Or does it mean sum of rewards in an episode (i.e. same as A iif discount value = 1.0)?
C/ Or does it mean a single reward within any episode?
I have no problem with mean terminology.
Thanks
Hi @JeanT,
The episode reward is the sum of all the rewards for each timestep in an episode. Yes, you could think of it as discount=1.0.
The mean is taken over the number of episodes not timesteps. The number of episodes is the number of new episodes sampled during the rollout phase or evaluation if it is an evaluation metric.
I concur with @mannyv (i.e. B solution). As an additional explanation, in a multi-agent RL configuration, the reported episode_reward_mean in json_object is the sum of the episode_reward_mean obtained by RL each agent.
What other metrics are available by default besides episode_reward_mean
? For example, is there any metric for step reward?
@fardinabbasi there is episode_reward_min
and episode_reward_max
are available