[RLlib] Continuing env, horizon and soft_horizon


I have an environment that is continuing (no episodes) and was wondering what the expected way was to use horizon/soft_horizon/no_done_at_end.

I assumed horizon decided when logging was done (i.e. over what number of steps we take the min/mean/max values for the logged metrics) and set it to 1000 since that seemed reasonable for my env. I also set soft_horizon to true since I don’t want my env to reset and I set no_done_at_end since I don’t want done to be true ever.

I have been logging data using custom_metrics in a DefaultCallback, and noticed this was not done at the same intervals as the horizon. So now I am a bit curious at what it is that triggers the logging, and what horizon does for me in a continuing environment?

Figured out that train_batch_size seems to set the length of the min/mean/max ldata collection and thus also the logging interval. Is this documented somewhere? I can’t find it and it is not obvious to me why this value should set the logging intervals.