What parameters do you pass to AlgorithmConfig for infinite horizon MDPs (continuous/non-episodic)? I have found the following links (below) on this, but they all involve old rllib versions and there have been significant API changes since then.
Currently when I train in such an environment, it simply gives a reward of ‘nan’ since the environment doesn’t terminate within the default 100 steps.