According to the documentation, ‘training_iteration’ counts the number of times tune.report()
has been called. Would that always be equivalent to the number of training episodes when training a RL agent on RLLIB?
Hi @carlorop ,
training_iteration
does count the number of iterations in which a training step has been made. This is however not identical with the number of episodes in RLlib. The reason for this is that whenever the RolloutWorker
s in RLlib collect new experiences from the environment they can do so either, by using a predefined number of steps in the environment or by stepping for as long as an episode takes. We define one or the other by setting batch_mode
to either truncate_episodes
(the default) or complete_episodes
. These settings define what data gets collected into a training batch.
Note, a training batch can then contain multiple episodes for both cases, however, complete_episodes
ensures that there are always complete episodes in the training batch (as long as there is no horizon
set).
Coming back to your question now: A single training batch usually contains not a single episode and as the Trainer
trains on a batch training_iteration
and number of episodes stepped in the environment are not the same.
For the configuration setting take a look into the Trainer
configuration.
Hope this helps