How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
The PPOTrainer class auto-adjusts rollout_fragment_length
by a floor division if
train_batch_size % (num_worker * num_envs_per_worker * rollout_fragment_length) != 0
.
Is there a reason why rollout_fragment_length
isn’t auto-adjusted by a ceiling operation?
An example:
train_batch_size = 4000
rollout_fragment_length = 200
num_worker = 7
num_envs_per_worker = 1
RLlib auto-adjusts rollout_fragment_length
to 571 (result of floor division) and ends up in collecting a train batch of size 7994.
Instead, a ceiling operation, i.e. math.ceil(train_batch_size / (num_workers * num_envs_per_worker))
, would yield a new rollout_fragment_length
of 572 and the collected train batch would have only a size of 4004.
Sample collection per train step would be drastically reduced.