How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
The PPOTrainer class auto-adjusts rollout_fragment_length by a floor division if
train_batch_size % (num_worker * num_envs_per_worker * rollout_fragment_length) != 0.
Is there a reason why rollout_fragment_length isn’t auto-adjusted by a ceiling operation?
An example:
train_batch_size = 4000
rollout_fragment_length = 200
num_worker = 7
num_envs_per_worker = 1
RLlib auto-adjusts rollout_fragment_length to 571 (result of floor division) and ends up in collecting a train batch of size 7994.
Instead, a ceiling operation, i.e. math.ceil(train_batch_size / (num_workers * num_envs_per_worker)), would yield a new rollout_fragment_length of 572 and the collected train batch would have only a size of 4004.
Sample collection per train step would be drastically reduced.