Does the agent train per episode or per iteration

The parameter train_batch_size controls the size of the training batch. If train_batch_size=n, does it means that the agent start learning after n episodes or after n iterations (calls to the step() method of the environment)?

Hi @carlorop

The timing of the first iteration of your optimization algorithm depends not only on the train_batch_size, but also on how often experiences are collected from rollout workers or how large the collected chunks are (see rollout_fragment_length).

Your Trainer instance starts training as soon as it has a minimum of train_batch_size experiences at hand. Where each experiences usually corresponds to one time step in your environment.

So the n in train_batch_size=n denotes steps in the environment and not episodes. Although your learner thread might start to learn later.

Cheers