Handling of Incomplete Episodes in RLlib

Magnus_Muller · September 25, 2024, 1:30pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I have a long traffic simulation with rewards every step. Lets say one day with 86.400 steps.
I want a lower train_batch_size e.g. 2.000 to update my model with the collected samples.
I use 20 rollout workers each with one env, so it should collect 100 samples from each environment.

I saw in other issues that ray only reports rewards when my simulations are terminated. Before they are just NaN. So in my case the first 86.400 *20 steps.
I have set rollout_fragment_length = config.ppo.train_batch_size // num_rollout_workers and batch_mode="truncate_episodes"

When no simulation terminates - does the algorithm still learn with intermediate rewards or does ray only train when simulations are completed?
Can I do anything, that it reports intermediate rewards more frequently? - Also my metrics logged to wandb are not logged frequently.

Versions:

python 3.10.14
ray 2.10.0

Topic		Replies	Views
[RLlib] Batch size for complete_episodes issue RLlib	6	2127	February 3, 2022
Bad inference after perfect training. What am I missing? RLlib	3	749	June 8, 2022
Can you specify workers in rllib algorithm to each collect the same number of episodes? Or each a specific number? RLlib	1	26	September 13, 2024
RLLib steps being sampled and trained but episode count is zero and reward metrics are nan RLlib	1	54	April 3, 2025
Inconsistent number of episodes with 'evaluate' RLlib	2	262	July 18, 2022

Handling of Incomplete Episodes in RLlib

Related topics