I was used to work with “episodes_this_iter” in the old API Stack. This column showed per training iteration, how many episodes were run. This was a value summed up across all rollout workers.
How does this translate to New API Stack?
env_runners/num_episodes_lifetime or env_runners/num_episodes ?
In the new API stack, the equivalent of “episodes_this_iter” is env_runners/num_episodes, which reports the number of episodes completed in the current training iteration, summed across all EnvRunners. The metric env_runners/num_episodes_lifetime instead tracks the cumulative number of episodes completed over the entire training run, not just the current iteration. This is confirmed by the Ray RLlib documentation and the source code where per-iteration episode counts are logged and reported as NUM_EPISODES.