train_batch_size controls the size of the training batch. If
train_batch_size=n, does it means that the agent start learning after
n episodes or after
n iterations (calls to the
step() method of the environment)?
The timing of the first iteration of your optimization algorithm depends not only on the
train_batch_size, but also on how often experiences are collected from rollout workers or how large the collected chunks are (see
Your Trainer instance starts training as soon as it has a minimum of
train_batch_size experiences at hand. Where each experiences usually corresponds to one time step in your environment.
train_batch_size=n denotes steps in the environment and not episodes. Although your learner thread might start to learn later.