How rllib train log the reward on tensorboard?

keep9oing · March 25, 2022, 8:20am

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

Hi, I’m curious that how “rllib train” CLI command logs the episode_reward_mean information on the Tensorboard.
The tensorboard seems updated when 1 iteration is processed. However, at that time, is the episode_reward_mean updated no matter how many episodes are run while training??
For example, when one iteration is done after rolling out 4000 timesteps, the episode_reward_mean is calculated while that duration of timesteps? Then, what will happens if any episode is finished since that episode is too long?
Also, if I make “metrics_num_episodes_for_smoothing” option to 1 and multiple episodes are done in a single iteration, how the rllib logs the episode_reward_mean?

Thank you.

Lars_Simon_Zehnder · March 25, 2022, 12:19pm

Hi @keep9oing ,

and welcome to the forum. So what is happening under the hood is that actually episode rewards get averaged over the last min_history=100 episodes. So it could be rather seen as a moving average in TensorBoard.

See the source code here.

Hope that explains it.
Simon

Topic		Replies	Views
Add episode reward variance into matrix and tensorboard RLlib	4	536	February 15, 2022
Custom Tensorboard Metric (episode.total_reward auto generates as mean, min, max) RLlib	5	250	June 24, 2024
Looking for the tensorboard source code part RLlib	5	591	May 4, 2022
How to obtain single episode reward? RLlib	6	1443	March 19, 2024
Meaning of episode_reward_mean RLlib	10	4168	September 21, 2023

How rllib train log the reward on tensorboard?

Related topics