Hello everyone, although I have had major successes training PPO models with RLlib, I still have difficulties understanding the mechanism for trajectory collection and particularly how
rollout_fragment_length affect such mechanism.
My particular case is that I wanna keep fragments of length = 50 in a fixed size train batch, that is, the workers keep filling this training batch with size 50 trajectories until the batch is full. However, RLllib will always pop up a warning and auto-adjust my fragment size to let
train_batch_size be exactly equal to
number_of_envs_per_worker. So I guess that “accumulated into an extra-large train batch” suggested here: 'rollout_fragment_length' and 'truncate_episodes' · Issue #10179 · ray-project/ray · GitHub won’t happen anymore.
Can I safely assume that the training batch is therefore all filled with the very first fragments collected from all workers, and other fragments after the first one will not be a part of any training batch (besides being used for calculating the