How to tell RLLIB trainer (Not Tune) to run that many number of episodes

How to tell RLLIB trainer (Not tune) to run that many number of episodes

Hi @Arif_Jahangir and welcome to the forum,

your question depends on your setup, i.e. are you using in your config "batch_mode"="complete_episodes" or "batch_mode"="truncate_episodes" (see here for a list of all configuration parameters)?

The number of episodes depends on multiple parameters: "batch_mode"="complete_episodes" would collect in a single batch only complete episodes, while "truncate_episodes" would mean that episodes can be truncated and where they get truncated depends on the configuration parameter "rollout_fragment_length" and the parameter "count_steps_by".

That said, your episodes might take many many steps through the environment until an episode ends - so configuration parameters should fit your experiment well. I suggest you to read the RLlib documentation about Sample Collection where you find good information about how episodes get collected and what to think about when choosing corresponding configuration parameters.

See also this discussion for understanding environment and agent steps.

Best,
Simon

1 Like

Thanks @Lars_Simon_Zehnder for your detail answer.

I have a csv file that has 1940748 rows and episode end when these rows are all read.
I want to run PPO to go through these 1940748 rows, 200 times (two hundred episodes)
So would my config will be as follows

“batch_mode”: “truncate_episodes”,
“horizon” : 1940748*200,
“soft_horizon”: True,
“no_done_at_end”: True,

Please tell me if the above configuration is correct for running 200 episodes if each episodes has 1940748 steps.

Hi @Arif_Jahangir1,

as written in the section Common Parameters of the Training API, setting no_done_at_end to False and soft_horizon to True results in not resetting the environment at the horizon, but adding done=True to the end of the horizon.

As you want to run 200 episodes, I guess you need to set the horizon to 1940748. If you do want the environment to run after these 1940748 rows again through the rows ,the environment needs probably to be reset and so soft_horizon needs to be set to False - only, if there are other variables in the environment that need to be kept after an episode I would set soft_horizon=True. Then, also no_done_at_end should be set to False as this signalizes RLlib that the episode is over.

Hope this helps,
Simon

2 Likes

Arif, did you find the correct approach for your query? If yes, can you share your implementation details and experience?

cc: @arturn @Rohan138 any response

I agree with @Lars_Simon_Zehnder