Unable to get 'episode_reward_mean'

Abid_Ali · January 2, 2025, 4:47am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I am trying to train PPO on a custom environment with external data. However, can’t seem to get ‘episode_reward_mean’ from the trained model. Following is a MWE:

from ray.rllib.algorithms.ppo import PPOConfig
import gymnasium as gym
from ray.tune.registry import register_env

env_name = "CartPole-v1"
register_env(env_name, lambda env_config: gym.make(env_name))

ppo = PPOConfig().environment(env=env_name).build()

for i in range(10):
    result = ppo.train()
    print(f"Iteration {i}: Episode Reward Mean: {result['episode_reward_mean']}")

When executing this script, I get the following error:

print(f"Iteration {i}: Episode Reward Mean: {result['episode_reward_mean']}")
                                             ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^

KeyError: ‘episode_reward_mean’

I am using Ray 2.40.0 with Python 3.11.0 on a Windows Machine.

Looking forward to hear from the Community.

tlaurie99 · January 2, 2025, 2:16pm

Hey @Abid_Ali ,

If you print the result or look into the ray logs for the .csv file they make you can see the name the reward saves to. For me, and I would imagine for you too, it is result['env_runners']['episode_reward_mean']. You also have access to episode length by doing result['env_runners']['episode_len_mean'], losses, and many more.

Let me know if you still need help.

Tyler

Abid_Ali · January 2, 2025, 11:55pm

Thanks for your response.

I have tried finding .csv in logs but couldn’t see any.

However, when printing result['env_runners'], I can see the following output for my last iteration yet there is nothing represented with 'episode_reward_mean':

{'module_episode_returns_mean': {'default_policy': 280.49}, 'weights_seq_no': 9.0, 'num_env_steps_sampled': 4000, 'num_agent_steps_sampled': {'default_agent': 4000}, 'episode_len_mean': 280.49, 'episode_len_min': 12, 'episode_len_max': 500, 'episode_return_mean': 280.49, 'agent_episode_returns_mean': {'default_agent': 280.49}, 'episode_return_min': 12.0, 'num_episodes_lifetime': 496, 'num_agent_steps_sampled_lifetime': {'default_agent': 40000}, 'num_module_steps_sampled_lifetime': {'default_policy': 40000}, 'num_episodes': 14, 'episode_return_max': 500.0, 'episode_duration_sec_mean': 0.12441997800007812, 'sample': 0.9800657424715093, 'num_module_steps_sampled': {'default_policy': 4000}, 'num_env_steps_sampled_lifetime': 40000, 'time_between_sampling': 4.7981498583133355}

Could this 'module_episode_returns_mean': {'default_policy': 280.49} or 'episode_return_mean': 280.49 be considered as equivalent of reward mean?

EDIT: I tried on a custom environment and got what I was looking for, i.e., 'episode_reward_mean' the same way you suggested. So maybe, I was not executing the MWE for enough interactions before posting my earlier reply (or something else!, not sure).
Thanks for getting me out of this.

Cheers!

Lars_Simon_Zehnder · January 3, 2025, 12:58pm

@Abid_Ali in case you are training a single agent the ["module_episode_returns_mean"]["default_policy"] is equivalent (as is the [agent_episode_returns_mean"]["default_agent"].

However, in your print out I also see the simple episode_return_mean.

Topic		Replies	Views
PPO.train incorrect result RLlib	1	258	May 23, 2023
When run PPO,it can not calculate episode reward	0	249	August 18, 2023
No Reward Appearing for MARL Environment during Training	5	1260	April 10, 2021
What's different between episode_return_mean of each iteration and episode_reward?	2	155	October 13, 2024
Issue with Running Experiments with Custom Gym Environment RLlib	4	505	June 13, 2022

Unable to get 'episode_reward_mean'

Related topics