Configuration for infinite horizon (continuous/non-episodic) environments?

person · July 12, 2024, 8:36am

What parameters do you pass to AlgorithmConfig for infinite horizon MDPs (continuous/non-episodic)? I have found the following links (below) on this, but they all involve old rllib versions and there have been significant API changes since then.

Currently when I train in such an environment, it simply gives a reward of ‘nan’ since the environment doesn’t terminate within the default 100 steps.

Thanks!

github.com/ray-project/ray

[rllib] Continuous instead of episodic problem

opened 01:27PM - 28 Jul 20 UTC

closed 01:13AM - 01 Apr 21 UTC

stefanbschneider

question stale

If I have a continuous instead of episodic problem, would I have to change anyth…ing in RLlib to address it? Or just ensure that my environment always returns `done=False`? In chapter 10.3 of [Sutton's book](http://incompleteideas.net/sutton/book/RLbook2018.pdf), I read that the average reward should be used for continuous problems instead of the discounted return. How would I achieve that? Or do I just have to set discount factor gamma to 0?

Topic		Replies	Views
Setting for Infinite Horizon MDPs RLlib	4	1626	June 15, 2021
`horizon` and `no_done_at_end` in combination with `PolicyClient` resp. `ExternalEnv` RLlib	0	227	June 17, 2021
Constant episode_reward_mean over training, even setting horizon parameter RLlib	3	53	December 5, 2024
[RLlib] Continuing env, horizon and soft_horizon RLlib	1	519	March 18, 2021
When run PPO,it can not calculate episode reward	0	251	August 18, 2023

Configuration for infinite horizon (continuous/non-episodic) environments?

Related topics