Handling of Incomplete Episodes in RLlib

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have a long traffic simulation with rewards every step. Lets say one day with 86.400 steps.
I want a lower train_batch_size e.g. 2.000 to update my model with the collected samples.
I use 20 rollout workers each with one env, so it should collect 100 samples from each environment.

I saw in other issues that ray only reports rewards when my simulations are terminated. Before they are just NaN. So in my case the first 86.400 *20 steps.
I have set rollout_fragment_length = config.ppo.train_batch_size // num_rollout_workers and batch_mode="truncate_episodes"

When no simulation terminates - does the algorithm still learn with intermediate rewards or does ray only train when simulations are completed?
Can I do anything, that it reports intermediate rewards more frequently? - Also my metrics logged to wandb are not logged frequently.

Versions:

  • python 3.10.14
  • ray 2.10.0