The parameter train_batch_size controls the size of the training batch. If train_batch_size=n, does it means that the agent start learning after n episodes or after n iterations (calls to the step() method of the environment)?
Hi @carlorop
The timing of the first iteration of your optimization algorithm depends not only on the train_batch_size, but also on how often experiences are collected from rollout workers or how large the collected chunks are (see rollout_fragment_length).
Your Trainer instance starts training as soon as it has a minimum of train_batch_size experiences at hand. Where each experiences usually corresponds to one time step in your environment.
So the n in train_batch_size=n denotes steps in the environment and not episodes. Although your learner thread might start to learn later.
Cheers