`horizon` and `no_done_at_end` in combination with `PolicyClient` resp. `ExternalEnv`

klausk55 · June 17, 2021, 9:20am

Hello guys!

I want to make a switch from continuous to episodic consideration for my problem I’m working on.
My env/simulator is an external one, thus I use the classes PolicyClient resp. ExternalEnv and now what I intend to do is to stop the episode after let’s say x hours of simulation time, reset the env and start a new next episode.
This is how I’ve thought of doing it:

I let config param horizon untouched (i.e. None → inf) but manually check in my simulator if x hours of simulation time (“custom horizon”) is hit. If so, the client records last rewards (client.log_returns), ends current episode (client.end_episode), I reset my env and finally the client starts a new next episode (client.start_episode).
I set the config param no_done_at_end = True since I manually force the env episode to terminate instead of really reaching a terminate state.

The idea here is to obtain logical episodes and to repeat a simulation scenario from its start state.

What would you say, is this the correct way of doing it?

BTW: Under this circumstances, to mimic the behavior of soft_horizon = True (i.e. no reset after hitting a horizon), would it be enough to just skip the reset of my external env/simulator, that is, client.end_episode → ~~reset env~~ → client.start_episode?

Topic		Replies	Views
[RLlib] Continuing env, horizon and soft_horizon RLlib	1	512	March 18, 2021
'client.end_episode()' don't make any difference RLlib	3	652	July 26, 2022
Horizon and No_Done_At_End RLlib	2	476	April 29, 2022
Constant episode_reward_mean over training, even setting horizon parameter RLlib	3	34	December 5, 2024
External Env crashes during training step RLlib	3	443	November 4, 2021

`horizon` and `no_done_at_end` in combination with `PolicyClient` resp. `ExternalEnv`

Related topics