# Setting for Infinite Horizon MDPs

Hi all,

I am using DQN to solve a simple economic problem with infinite horizon. Right now, I am setting it as

horizon=100
soft_horizon = True
no_done_at_end = True

Is this correct? The results I am getting make me doubt that the algorithms is truly discounting all the future rewards (it behaves very myopically).

1 Like

I believe thatâ€™s the correct setting.

Whatâ€™s going wrong in your scenario? Do you have an equivalent/similar episodic version of your problem that works better?

With those settings (soft_horizon=100 and no_dome_at_end = True), the algorithms were struggling to solve even the simplest dynamic problems. I wasnâ€™t sure if with those setting the algorithms were truly maximizing the infinite discounted sum of rewards or the sum of episodic rewards. I tried with no horizon or with horizon = float(inf) but them I get nan in mean episode rewards so it was hard to get feedback and I was also unsure if that works. Since then, Iâ€™ve found an episodic version of the model with a large horizon that manages to learn well. Overall, my doubt is if Rllib assumes an episodic problem under the hood.

I appreciate your help! I am PhD in economics at NYU, where one of the strongs suit of the PhD is programming big macro models. I am going on the market and I am trying to convince economist that RL can be used to solve high dimensional economic models. I am building an open source economy simulator in Python.

Rllib is very powerful, but itâ€™s so opaque that I am on the fence on wether it is suitable for academic research. Iâ€™ve been working with it for months and Iâ€™ve tried to get under the hood but the depth and interdependencies are overwhelming.

1 Like

I understand; Iâ€™m also still learning about all the different features, options, etc.

As you already figured out, the three relevant config options are horizon, soft_horizon, no_done_at_end as described here: RLlib Training APIs â€” Ray v1.4.0

I think soft_horizon needs to be boolean and soft_horizon = 100 doesnâ€™t make sense. But you probably mean horizon=100 and soft_horizon=True as your current setting?

If you donâ€™t want your environment to terminate, did you try keeping horizon: None?
Do you have a custom environment and can simply ensure in the environment implementation that it runs forever without returning done=True?
Of course, with infinite â€śepisodesâ€ť, you wonâ€™t get any episode-related metrics, but maybe you could log a custom metric (via the on_episode_step callback) instead?

From the description here and here:

soft_horizon (bool): If True, calculate bootstrapped values as if
episode had ended, but donâ€™t physically reset the environment
when the horizon is hit.

It seems to me that youâ€™d want to also keep soft_horizon = False to ensure that the bootstrapped value estimates are calculated identically in each step.
But if you have an horizon configured, youâ€™d also need soft_horizon=True to avoid that the environment is actually reset.

I tried in one of my environments to set a horizon and soft_horizon=True and no_done_at_end=True, which did mean that the episode ran forever. But I guess the rewards were still calculated based on â€śepisodesâ€ť of length horizon. Still the result was ok.

Overall, I do think most RL frameworks, incl. RLlib, focus more on episodic scenarios. Also see related issue: [rllib] Continuous instead of episodic problem Â· Issue #9756 Â· ray-project/ray Â· GitHub

For me, I figured that Iâ€™ll just consider sufficiently long episodes instead of a continuous problem, which went well. Is that an option for you?

Yes I have found an episodic version that works well. Thanks for your help!