What's different between episode_return_mean of each iteration and episode_reward?

Do_Giang · September 28, 2024, 8:36am

I am experimenting with a program using DQN to control traffic lights with SUMO-RL Environtment. I noticed there is a different between episode_return_mean of each iteration and episode_rewards.

When I print out results[‘env_runners’][‘episode_return_mean’] of each iteration, the value are [-324.85, -249.24, -152.9275, -127.304, -127.26833, -106.62375, -100.06777, -85.0690, -78.4025, -73.9646, -76.87, -78.1725, -72.2861, -68.92, -65.8255]

But after the training process is finished get results[‘env_runners’][‘hist_stats’][‘episode_reward’] with values: [-324.85, -173.63, -103.52, -9.71, -24.81, -127.09, -52.44, -36.94, -47.62, -25.08, -10.07, -5.07, -20.71, -70.3, -121.21, -97.71, -43.54, -6.85, -8.34, -7.02]

I don’t understand the difference between episode_return_mean and episode_reward. Please explain it to me. Thanks a lot.

Lars_Simon_Zehnder · October 4, 2024, 10:04am

@Do_Giang welcome to the forum and thanks for posting this question.

May I ask on which Ray version you run your experiment? The actual state of logging is:

res["env_runners"]["hist_stats"]["episode_reward"]

stores the history of reward sums per episode and

res["env_runners"]["episode_return_mean"]

defines the average of these episode reward sums. This might have been a bit different in older versions of Ray because there was still a hybrid stack which has been deprecated and should not be used. We recommend any user to switch to the new API stack (migration guide) as the old stack will be deprecated in the very near future.

Do_Giang · October 13, 2024, 3:12am

As I checked, the Ray version using ray --version returns ray, version 2.37.0 . I’m going to try the new API stack that you suggested above.

Thanks for your support.

Topic		Replies	Views
Unable to get 'episode_reward_mean' RLlib	3	212	January 3, 2025
Meaning of episode_reward_mean RLlib	10	4262	September 21, 2023
[RLlib, Tune, PPO] episode_reward_mean based on new episodes for each iteration Configure Algorithm, Training, Evaluation, Scaling	1	41	November 25, 2024
Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max) RLlib	5	308	June 24, 2024
Why the episode reward mean is always the same number for a while(about 10 iters)? RLlib	1	730	May 4, 2021

What's different between episode_return_mean of each iteration and episode_reward?

Related topics