Hello guys!
I want to make a switch from continuous to episodic consideration for my problem I’m working on.
My env/simulator is an external one, thus I use the classes PolicyClient
resp. ExternalEnv
and now what I intend to do is to stop the episode after let’s say x hours of simulation time, reset the env and start a new next episode.
This is how I’ve thought of doing it:
-
I let config param
horizon
untouched (i.e.None
→inf
) but manually check in my simulator if x hours of simulation time (“custom horizon”) is hit. If so, the client records last rewards (client.log_returns
), ends current episode (client.end_episode
), I reset my env and finally the client starts a new next episode (client.start_episode
). -
I set the config param
no_done_at_end = True
since I manually force the env episode to terminate instead of really reaching a terminate state.
The idea here is to obtain logical episodes and to repeat a simulation scenario from its start state.
What would you say, is this the correct way of doing it?
BTW: Under this circumstances, to mimic the behavior of soft_horizon = True
(i.e. no reset after hitting a horizon), would it be enough to just skip the reset of my external env/simulator, that is, client.end_episode
→ reset env → client.start_episode
?