How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have a long traffic simulation with rewards every step. Lets say one day with 86.400 steps.
I want a lower train_batch_size e.g. 2.000 to update my model with the collected samples.
I use 20 rollout workers each with one env, so it should collect 100 samples from each environment.
I saw in other issues that ray only reports rewards when my simulations are terminated. Before they are just NaN. So in my case the first 86.400 *20 steps.
I have set rollout_fragment_length = config.ppo.train_batch_size // num_rollout_workers
and batch_mode="truncate_episodes"
When no simulation terminates - does the algorithm still learn with intermediate rewards or does ray only train when simulations are completed?
Can I do anything, that it reports intermediate rewards more frequently? - Also my metrics logged to wandb are not logged frequently.
Versions:
- python 3.10.14
- ray 2.10.0