[RLlib, Tune, PPO] episode_reward_mean based on new episodes for each iteration

PhilippWillms · November 25, 2024, 8:42pm

Hi community,

I would like to discuss with you an interesting observation which I recently regarding the episode_reward_mean which is logged for each training iteration.

The episode_reward_mean visible in the progress.csv file is always calculated taking into account ALL episodes which have been executed in the trial. Data reference can be obtained from hist_stats/episode_reward.

However, what do you think about an episode_reward_mean_per_iteration ? In that case, the metric would calculate a mean on the NEW episodes occurring in the current iteration ONLY.

I see following benefits:

Measurement on convergence to a (local) optimum in the reward function
Better judgement on solution quality in the sense of “will any more iterations make sense”

Happy to hear your thoughts

BR Philipp

mannyv · November 25, 2024, 10:18pm

Hi @PhilippWillms,

As far as I am aware it does not use all episodes in the trial it only uses a configurable number of the most recent n episodes. This can be configured in the reporting options. Default is 100.

https://docs.ray.io/en/latest/rllib/rllib-training.html#specifying-reporting-options

metrics_num_episodes_for_smoothing – Smooth rollout metrics over this many episodes, if possible. In case rollouts (sample collection) just started, there may be fewer than this many episodes in the buffer and we’ll compute metrics over this smaller number of available episodes. In case there are more than this many episodes collected in a single training iteration, use all of these episodes for metrics computation, meaning don’t ever cut any “excess” episodes. Set this to 1 to disable smoothing and to always report only the most recently collected episode’s return.

Topic		Replies	Views
Meaning of episode_reward_mean RLlib	10	4218	September 21, 2023
How rllib train log the reward on tensorboard? RLlib	1	537	March 25, 2022
Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max) RLlib	5	269	June 24, 2024
Unable to get 'episode_reward_mean' RLlib	3	191	January 3, 2025
How to obtain single episode reward? RLlib	6	1466	March 19, 2024

[RLlib, Tune, PPO] episode_reward_mean based on new episodes for each iteration

Related topics