How to obtain single episode reward?

From here, I learned that the “episode_reward_mean” metric indicates the average of rewards collected in all previous episodes.

How can I obtain a single episode reward? Should I make a custom metric for it in the callback class?

If you look at your progress.csv files in ~/ray_results you can see that it already saves the individual episode rewards in “hist_stat/episode_reward”, if you have the train_results (like ray/action_masking.py at a7d552ca2541376b87a40bc6b2189bab5a5c6c5a · ray-project/ray · GitHub) as a variable then you can access them just doing train_results[“hist_stats”][“episode_reward”]

1 Like

@Roller44 Also take a look at custom_metrics_and_callbacks.py to see how to access the metrics and create your own ones to track them in TensorBoard

It works! Thanks a lot!

A follow-up question: is there a way to access custom metrics without being averaged by those in previous episodes?

In order words, is there a way to access xxx instead of xxx_min, xxx_mean, and xxx_max?

That might not be a good idea, because I can only access the average of the custom metric values collected in all previous episodes.

Is there a way to access custom_metric instead of custom_metric_min, custom_metric_mean, and custom_metric_max?

It is a little unclear where you need to access the episode reward. Within the Episode object there should be the single rewards collected over the timesteps in Episode.hist_data.

If you need a tracking of these values subclassing DefaultCallbacks gives you in on_episode_end() access to the Episode instance. By this you can create own metrics with the single rewards for monitoring in TensorBoard and for tuning (also for post analysis).

If you need to access the values in a custom loop where you call trainer.train() the return value of the latter function as shown by @lucasalavapena is the best way to go.

1 Like