Train on episode end


Is there possible to perform off-policy agent training at the episode end? My case is the following: episode can last up to 50 iterations and for example, agent terminates episode after 30 iterations. During interaction with the environment, there is no training. The training is performed at the end of the episode with 30 train iterations. I know that I can set “batch_mode” to “complete_episode”, however, I do not how to dynamically set train iteration. Thanks in advance for the help!


The configuration contains a batch_mode argument, which indicates whether the trainer should truncate episodes to generate the rollouts or whether it should wait for completed episodes. This may be what you’re looking for.