How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Hi, I’m curious that how “rllib train” CLI command logs the episode_reward_mean information on the Tensorboard.
The tensorboard seems updated when 1 iteration is processed. However, at that time, is the episode_reward_mean updated no matter how many episodes are run while training??
For example, when one iteration is done after rolling out 4000 timesteps, the episode_reward_mean is calculated while that duration of timesteps? Then, what will happens if any episode is finished since that episode is too long?
Also, if I make “metrics_num_episodes_for_smoothing” option to 1 and multiple episodes are done in a single iteration, how the rllib logs the episode_reward_mean?