I understand; I’m also still learning about all the different features, options, etc.
As you already figured out, the three relevant config options are horizon
, soft_horizon
, no_done_at_end
as described here: Getting Started with RLlib — Ray 2.8.0
I think soft_horizon
needs to be boolean and soft_horizon = 100
doesn’t make sense. But you probably mean horizon=100
and soft_horizon=True
as your current setting?
If you don’t want your environment to terminate, did you try keeping horizon: None
?
Do you have a custom environment and can simply ensure in the environment implementation that it runs forever without returning done=True
?
Of course, with infinite “episodes”, you won’t get any episode-related metrics, but maybe you could log a custom metric (via the on_episode_step
callback) instead?
From the description here and here:
soft_horizon (bool): If True, calculate bootstrapped values as if
episode had ended, but don’t physically reset the environment
when the horizon is hit.
It seems to me that you’d want to also keep soft_horizon = False
to ensure that the bootstrapped value estimates are calculated identically in each step.
But if you have an horizon configured, you’d also need soft_horizon=True
to avoid that the environment is actually reset.
I tried in one of my environments to set a horizon and soft_horizon=True
and no_done_at_end=True
, which did mean that the episode ran forever. But I guess the rewards were still calculated based on “episodes” of length horizon
. Still the result was ok.
Overall, I do think most RL frameworks, incl. RLlib, focus more on episodic scenarios. Also see related issue: [rllib] Continuous instead of episodic problem · Issue #9756 · ray-project/ray · GitHub
For me, I figured that I’ll just consider sufficiently long episodes instead of a continuous problem, which went well. Is that an option for you?