I understand; I’m also still learning about all the different features, options, etc.
As you already figured out, the three relevant config options are
no_done_at_end as described here: RLlib Training APIs — Ray v1.4.0
soft_horizon needs to be boolean and
soft_horizon = 100 doesn’t make sense. But you probably mean
soft_horizon=True as your current setting?
If you don’t want your environment to terminate, did you try keeping
Do you have a custom environment and can simply ensure in the environment implementation that it runs forever without returning
Of course, with infinite “episodes”, you won’t get any episode-related metrics, but maybe you could log a custom metric (via the
on_episode_step callback) instead?
From the description here and here:
soft_horizon (bool): If True, calculate bootstrapped values as if
episode had ended, but don’t physically reset the environment
when the horizon is hit.
It seems to me that you’d want to also keep
soft_horizon = False to ensure that the bootstrapped value estimates are calculated identically in each step.
But if you have an horizon configured, you’d also need
soft_horizon=True to avoid that the environment is actually reset.
I tried in one of my environments to set a horizon and
no_done_at_end=True, which did mean that the episode ran forever. But I guess the rewards were still calculated based on “episodes” of length
horizon. Still the result was ok.
Overall, I do think most RL frameworks, incl. RLlib, focus more on episodic scenarios. Also see related issue: [rllib] Continuous instead of episodic problem · Issue #9756 · ray-project/ray · GitHub
For me, I figured that I’ll just consider sufficiently long episodes instead of a continuous problem, which went well. Is that an option for you?