I have an environment that is continuing (no episodes) and was wondering what the expected way was to use horizon/soft_horizon/no_done_at_end.
I assumed horizon decided when logging was done (i.e. over what number of steps we take the min/mean/max values for the logged metrics) and set it to 1000 since that seemed reasonable for my env. I also set soft_horizon to true since I don’t want my env to reset and I set no_done_at_end since I don’t want done to be true ever.
I have been logging data using custom_metrics in a DefaultCallback, and noticed this was not done at the same intervals as the horizon. So now I am a bit curious at what it is that triggers the logging, and what horizon does for me in a continuing environment?