Constant episode_reward_mean over training, even setting horizon parameter

mannyv · December 5, 2024, 10:38pm

You were already looking at the correct value to track reward during training, episode_reward_mean. Each time it is logged it is the mean of the most recent 100 completed episodes. Those are the ones in hist_stats. If no episodes complete during the sample phase then the value will stay the same since hist_stats will not change either.

You can control the number of episodes in hist_stats with the reporting argument metrics_num_episodes_for_smoothing.

https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.reporting.html

What are you trying to accomplish with the truncation? Unless you have some specific need, like for example your environment never terminates, or you want to ensure a maximum length to for example timeout if an agent gets stuck, there is no need to truncate an episode. In many cases, variable length episodes and episodes of unknown length work just fine in RLLIB and PPO without any special treatment.

Topic		Replies	Views
When run PPO,it can not calculate episode reward	0	250	August 18, 2023
Unable to get 'episode_reward_mean' RLlib	3	191	January 3, 2025
Understanding agent_timesteps_total RLlib	2	580	February 3, 2023
Help with Reward Plateaus and Missing Initial Episodes in PG Algorithm Training RLlib	0	14	November 29, 2024
Horizon curriculum in generative adversarial imitation learning RLlib	4	348	May 4, 2021

Constant episode_reward_mean over training, even setting horizon parameter

Related topics