I understand; I’m also still learning about all the different features, options, etc.
As you already figured out, the three relevant config options are horizon, soft_horizon, no_done_at_end as described here: Getting Started with RLlib — Ray 2.8.0
I think soft_horizon needs to be boolean and soft_horizon = 100 doesn’t make sense. But you probably mean horizon=100 and soft_horizon=True as your current setting?
If you don’t want your environment to terminate, did you try keeping horizon: None?
Do you have a custom environment and can simply ensure in the environment implementation that it runs forever without returning done=True?
Of course, with infinite “episodes”, you won’t get any episode-related metrics, but maybe you could log a custom metric (via the on_episode_step callback) instead?
From the description here and here:
soft_horizon (bool): If True, calculate bootstrapped values as if
episode had ended, but don’t physically reset the environment
when the horizon is hit.
It seems to me that you’d want to also keep soft_horizon = False to ensure that the bootstrapped value estimates are calculated identically in each step.
But if you have an horizon configured, you’d also need soft_horizon=True to avoid that the environment is actually reset.
I tried in one of my environments to set a horizon and soft_horizon=True and no_done_at_end=True, which did mean that the episode ran forever. But I guess the rewards were still calculated based on “episodes” of length horizon. Still the result was ok.
Overall, I do think most RL frameworks, incl. RLlib, focus more on episodic scenarios. Also see related issue: [rllib] Continuous instead of episodic problem · Issue #9756 · ray-project/ray · GitHub
For me, I figured that I’ll just consider sufficiently long episodes instead of a continuous problem, which went well. Is that an option for you?