Meaning of episode_reward_mean

Hi @carlorop,
RLLIB collects a number of metrics. One of those is the episode_reward. When creating the summary it will compute and store the mean, min , and max of those metrics.
The metrics summarize all of the collected values during the previous iteration. What defines an iteration varies. If you are using PPO, for example, the episode reward will contain the rewards obtained for every completed episode that occurred when collecting the last train_batch_size steps. If no episodes returned done which is possible for some really long environments, then that metric would be empty.