Does training_iteration correspond to number of episodes?

According to the documentation, ‘training_iteration’ counts the number of times tune.report() has been called. Would that always be equivalent to the number of training episodes when training a RL agent on RLLIB?

Hi @carlorop ,

training_iteration does count the number of iterations in which a training step has been made. This is however not identical with the number of episodes in RLlib. The reason for this is that whenever the RolloutWorkers in RLlib collect new experiences from the environment they can do so either, by using a predefined number of steps in the environment or by stepping for as long as an episode takes. We define one or the other by setting batch_mode to either truncate_episodes (the default) or complete_episodes. These settings define what data gets collected into a training batch.

Note, a training batch can then contain multiple episodes for both cases, however, complete_episodes ensures that there are always complete episodes in the training batch (as long as there is no horizon set).

Coming back to your question now: A single training batch usually contains not a single episode and as the Trainer trains on a batch training_iteration and number of episodes stepped in the environment are not the same.

For the configuration setting take a look into the Trainer configuration.

Hope this helps