How rllib train log the reward on tensorboard?

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

Hi, I’m curious that how “rllib train” CLI command logs the episode_reward_mean information on the Tensorboard.
The tensorboard seems updated when 1 iteration is processed. However, at that time, is the episode_reward_mean updated no matter how many episodes are run while training??
For example, when one iteration is done after rolling out 4000 timesteps, the episode_reward_mean is calculated while that duration of timesteps? Then, what will happens if any episode is finished since that episode is too long?
Also, if I make “metrics_num_episodes_for_smoothing” option to 1 and multiple episodes are done in a single iteration, how the rllib logs the episode_reward_mean?

Thank you.

Hi @keep9oing ,

and welcome to the forum. So what is happening under the hood is that actually episode rewards get averaged over the last min_history=100 episodes. So it could be rather seen as a moving average in TensorBoard.

See the source code here.

Hope that explains it.