Tune.run() runs more iterations than `training_iteration`

Hi folks,

I am making my way with tune and everything runs fine so far. However, there is an output I cannot explain:

In the results_df of the ExperimentAnalysis that tune.run() returns I get episodes_this_iter=3 and episodes_total=3 while I set in my stop configuration training_iteration=1. Why is this? How often does tune.run() call the Trainer.train() method per trial?

Here are my configs:

config = {
    "env": MyEnv,
    "env_config": {        
        "ts": 50,
        "mu": 20,
        "sigma": 0.05,        
    "num_workers": 3,
    "create_env_on_driver": True, 
    "model": {
        "custom_model_config": {
            "multiplicator": tune.grid_search([1, 2, 3, 4]),                
    "batch_mode": "complete_episodes" ,
    "rollout_fragment_length": 50,
    "train_batch_size": 50,
    "evaluation_num_episodes": 0,

stop_config = {    
    "training_iteration": 1,

Is there somewhere a description of these result metrics like episodes_total? And where do I find an overview of the configuration parameters that exist for the config and stop in tune.run()?

Thanks for taking time

This likely means that it took 3 episodes to
reach the train_batch_size requirement in your config of 50 environment steps.

What were you expecting to see? What are you trying to do?

@mannyv Thanks for your time. I want to understand better how tune.run() works. And what I try to do here is to run for each hyperparameter value a single episode with 50 steps. It’s probably worth noting that my environment returns done in its step()-function, if the timestep limit of 50 is reached. So it stops always after 50 timesteps.

From your answer above:

[…] it took 3 episodes to reach the train_batch_size requirement […]

I wonder, if I have to set the num_workers down to 1 together with "batch_mode": "complete_episodes"? As I understand right now the "complete_episodes" mode ensures that my envrionment steps until done=True (so 50) and this in turn fits into the batch_train_size of 50. The rollout_fragment_length=50 ensures that sample batches of size 50 are collected from the rollout workers and num_workers=3 defines 3 of such rollout workers. Maybe I have to cut the num_workers down to 1 as well?

Hi @Lars_Simon_Zehnder,

I missed the num_workers when I was looking at it last time. Your reasoning is exactly correct. Since you are running three workers, they will each have an environment that they are sampling experience from in parallel. This combined with complete_episodes means you will get 3 episodes worth of experience for every (internal) call to sample. Tune will call sample until it is >= the correct train_batch_size. There are a couple other keys that can affect this but I do not think they are being used here. 1 ism min_iter_time_s and the other is timesteps_per_iteration. There is also a key learning_starts that will delay the first training step until that many episode steps are collected. This is usually used with off policy algorithms to pre-fill the replay buffer but it is valid for any algorithm I think.

The rollout_fragment_length is ignored when batch_mode=complete episodes but if you had truncate_episodes instead it would pull num_workers*rollout_fragment_length timesteps in each call to sample instead.

Hi @mannyv,

thanks for the answer and the infos, especially learning_starts was unknown to me, so far.

Regarding the num_workers parameter: this made it work. So, problem solved.

Regarding the rollout_fragment_length parameter: I tried out to run tune without this parameter. The result was that it fell back to the default value of 200 and I got with an episode length of 50 and 1 worker 4 episodes_total. It appears this parameter plays an important role for the number of episodes. On the other side: I used a train_batch_size of 1 instead of 50 and timesteps_total remained at 50. Probably train_batch_size only plays a role in learning a model, but not in stepping through the environment?

Btw, I thought that when I increase the num_workers parameter the trials for different hyper parameter values would be run in parallel. Instead it appears that the rollout workers run the same trial many times. What is the reason behind this?

The parameter you are looking for is the tune.run parameter “num_samples”