"Iteration always 1" challenge

Even though I read the Tune FAQ article Why are all my trials returning “1” iteration?, I am concerned that my implementation is working. Indeed I face the topic of all 5 trials returning 1 iteration.

How can I control that algorithm training has been properly executed? Should I look at the num_sgd_iter?

    tuner = tune.Tuner(
        "PPO",
        param_space=config,
        run_config=RunConfig(
            stop=stopping_criteria,
            checkpoint_config=CheckpointConfig(
                checkpoint_score_attribute="episode_reward_mean",
                checkpoint_score_order="max",
                checkpoint_frequency=2,
            ),
        ),
        tune_config=tune.TuneConfig(
            metric="episode_reward_mean",
            mode="max",
            num_samples=5,
            reuse_actors=False,
            max_concurrent_trials=3
        ),

The interesting part here would be the defined stopping_criteria. Can you share those with us?

RLlib returns a number of metrics that can be checked to see if training took place, but yes, the num_sgd_iter is a good candidate to see how often SGD optimization has been triggered.

The stopping criteria are as follows, to be finetuned:

stopping_criteria = {
        "training_iteration": 5,
        "timesteps_total": 6,
        "episode_reward_mean": 5,
    }

While the iter in the trial summary output always remains 1, indeed the num_sgd_iter and the ts are different. However, the num_sgd_iter is a tuning parameter initialized by

"num_sgd_iter": tune.randint(100, 1000),

The stopping criteria are OR conditions, i.e. as soon as one of the conditions applies, the training is stopped. I would assume that timesteps_total achieves 6 in the first iteration, hence you see only one result.

If you comment out timesteps_total and episode_reward_mean from the stopping criteria, each trial should run to 5 iterations.

1 Like