How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I am trying to train PPO on a custom environment with external data. However, can’t seem to get ‘episode_reward_mean’ from the trained model. Following is a MWE:
from ray.rllib.algorithms.ppo import PPOConfig
import gymnasium as gym
from ray.tune.registry import register_env
env_name = "CartPole-v1"
register_env(env_name, lambda env_config: gym.make(env_name))
ppo = PPOConfig().environment(env=env_name).build()
for i in range(10):
result = ppo.train()
print(f"Iteration {i}: Episode Reward Mean: {result['episode_reward_mean']}")
When executing this script, I get the following error:
print(f"Iteration {i}: Episode Reward Mean: {result['episode_reward_mean']}")
~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
KeyError: ‘episode_reward_mean’
I am using Ray 2.40.0 with Python 3.11.0 on a Windows Machine.
Looking forward to hear from the Community.
Hey @Abid_Ali ,
If you print the result or look into the ray logs for the .csv file they make you can see the name the reward saves to. For me, and I would imagine for you too, it is result['env_runners']['episode_reward_mean']
. You also have access to episode length by doing result['env_runners']['episode_len_mean']
, losses, and many more.
Let me know if you still need help.
Tyler
2 Likes
Thanks for your response.
I have tried finding .csv in logs but couldn’t see any.
However, when printing result['env_runners']
, I can see the following output for my last iteration yet there is nothing represented with 'episode_reward_mean'
:
{'module_episode_returns_mean': {'default_policy': 280.49}, 'weights_seq_no': 9.0, 'num_env_steps_sampled': 4000, 'num_agent_steps_sampled': {'default_agent': 4000}, 'episode_len_mean': 280.49, 'episode_len_min': 12, 'episode_len_max': 500, 'episode_return_mean': 280.49, 'agent_episode_returns_mean': {'default_agent': 280.49}, 'episode_return_min': 12.0, 'num_episodes_lifetime': 496, 'num_agent_steps_sampled_lifetime': {'default_agent': 40000}, 'num_module_steps_sampled_lifetime': {'default_policy': 40000}, 'num_episodes': 14, 'episode_return_max': 500.0, 'episode_duration_sec_mean': 0.12441997800007812, 'sample': 0.9800657424715093, 'num_module_steps_sampled': {'default_policy': 4000}, 'num_env_steps_sampled_lifetime': 40000, 'time_between_sampling': 4.7981498583133355}
Could this 'module_episode_returns_mean': {'default_policy': 280.49}
or 'episode_return_mean': 280.49
be considered as equivalent of reward mean?
EDIT: I tried on a custom environment and got what I was looking for, i.e., 'episode_reward_mean'
the same way you suggested. So maybe, I was not executing the MWE for enough interactions before posting my earlier reply (or something else!, not sure).
Thanks for getting me out of this.
Cheers!
@Abid_Ali in case you are training a single agent the ["module_episode_returns_mean"]["default_policy"]
is equivalent (as is the [agent_episode_returns_mean"]["default_agent"]
.
However, in your print out I also see the simple episode_return_mean
.