The parameter train_batch_size
controls the size of the training batch. If train_batch_size=n
, does it means that the agent start learning after n
episodes or after n
iterations (calls to the step()
method of the environment)?
Hi @carlorop
The timing of the first iteration of your optimization algorithm depends not only on the train_batch_size
, but also on how often experiences are collected from rollout workers or how large the collected chunks are (see rollout_fragment_length
).
Your Trainer instance starts training as soon as it has a minimum of train_batch_size
experiences at hand. Where each experiences usually corresponds to one time step in your environment.
So the n
in train_batch_size=n
denotes steps in the environment and not episodes. Although your learner thread might start to learn later.
Cheers