Hi there! I know that usually under the hood, rllib algorithms with multiple workers will all perform episode rollouts with each worker doing multiple rollouts until the specified training batch size is reached. I was wondering if instead, whether it’s possible for me to specify that each worker should be collecting X episodes? That way it doesn’t matter if some workers end up with shorter episodes vs. other episodes, but they all end up collecting an even amount.
I think the gist to this question lies in understanding the relationhsip of rllib rollout batches and episodes. Hints for it lie in the explanation of the parameter batch_mode
in AlgorithmConifg().env_runners()
.
With the complete_episodes
setting, you can enable that no episodes will be truncated and hence you can ensure a minimum size of rollout batch. Together with the train_batch_size
parameter you should be able to twerk batches wih consistent number of episodes.
It is easier if each episode has the same length, but that for sure depends on your environment design.
1 Like