Hist_stats/episode_reward in new API stack

Hello Ray community,

what happened to the hist_stats/episode_reward in progress.csv file of old RLlib API stack?

It was useful to keep track of the rewards reached in each episode run over a full trial with multiple training iterations.

Further, in old API Stack there was a workaround needed in processing progress.csv file for hist_stats/episode_reward: After the first training iteration wherein the episodes_total exceed the value of 100, the hist_stats/episode_reward cut off always the first numbers when the new values of the current iteration were added. How does this work in new API stack?

In the old RLlib API stack, hist_stats/episode_reward in progress.csv stored the most recent N episode rewards (default N=100, controlled by metrics_num_episodes_for_smoothing). When more than N episodes were completed, the oldest rewards were dropped, so the vector always held at most N values, and new rewards replaced the oldest ones after each iteration. This required care when post-processing progress.csv, as the vector was truncated and not cumulative over the entire trial—only the most recent N episode rewards were kept at any time (discuss.ray.io: constant episode reward mean, discuss.ray.io: rllib-tune-ppo-episode-reward-mean).

In the new API stack, this behavior is unchanged: hist_stats/episode_reward still only contains the most recent metrics_num_episodes_for_smoothing episode rewards, and older values are dropped as new ones are added. The smoothing window size is still configurable, and the vector is not cumulative over the full trial. If you want to keep a full history of all episode rewards, you must explicitly log and store them yourself (discuss.ray.io: constant episode reward mean, github.com: algorithm.py). Would you like more detail or code examples on how to log all episode rewards in the new API stack?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM : I cannot find anymore a column searching for “hist” or “reward” in the progress.csv. Is there any config option to be activated for that in ray 2.53?

In Ray 2.53, the progress.csv file does not include hist_stats/episode_reward or similar columns by default. To store per-episode rewards (hist_stats), you must set the config option keep_per_episode_custom_metrics = True in your AlgorithmConfig. Without this, only aggregated metrics (like means) are saved, not the full per-episode history (discuss.ray.io: is-hist-stats-no-longer-supported, discuss.ray.io: accessing-custom-metrics-for-episodes).

If you enable keep_per_episode_custom_metrics, the per-episode custom metrics will be available in the results dictionary, but not necessarily in progress.csv. You may need to access them from result.json or directly from the results object in your code (discuss.ray.io: is-hist-stats-no-longer-supported). Would you like more detail on how to access or log these metrics?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM : I used MetricsLogger API in my custom callback to keep track of episode_return`.

To track episode_return using the MetricsLogger API in your custom callback, call metrics_logger.log_value(“episode_return”, value, reduce=“mean”, window=N) in your on_episode_end method. This will log the mean episode return over a sliding window of N episodes, and the result will be available in results[“env_runners”][“episode_return”] after each training iteration (doc/source/rllib/metrics-logger.rst).

If you want to log all episode returns (not just the mean), set reduce=None and handle the aggregation yourself, as the default behavior is to aggregate over the window (doc/source/rllib/metrics-logger.rst). Would you like a code example?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM : Yes, I would need the series of returns saved, but for sure handable in progress.csv file in case of many episodes (~ 200 - 300 per trial).

The MetricsLogger API can log a series of episode returns by calling metrics_logger.log_value(“episode_return”, value, reduce=None, clear_on_reduce=True) in your callback. However, when reduce=None, the resulting list of episode returns is stored in results[“env_runners”][“episode_return”], but in progress.csv this will appear as a list or array per row, which can become unwieldy for many episodes (github.com/ray-project/ray/issues/50294).

Currently, there is no built-in config option to automatically store a cumulative, easily post-processable series of all episode returns in progress.csv for large numbers of episodes. For robust post-processing, it is recommended to write the series to a separate file (e.g., JSON or CSV) from your callback, or post-process result.json, as progress.csv is not designed for large arrays per row (github.com/ray-project/ray/issues/50294). Would you like a code example for writing episode returns to a separate file?

Sources:

Hint: Mention @RunLLM in the post for followups.