Questions about tune stopping condition with PBT

Hi there !
I hope you’re having a great day.

I have some questions about population based training with ray, and more specifically the moment it stops to tune.

For more context, here’s a part of my code (I’m using tune.run which I know is the soon-to-be deprecated API):

       scheduler = PopulationBasedTraining(
         hyperparam_mutations=hyperparam_mutations,
             time_attr="training_iteration",
             metric="val_loss",
             mode="min",
             perturbation_interval=1,
         )
 
         reporter = CLIReporter(
             parameter_columns=parameters_to_display,
             metric_columns=["val_loss", "epoch"],
             metric="val_loss",
             mode="min"
         )
 
         results = tune.run(
             partial(training_func),
             config=config,
             scheduler=scheduler,
             num_samples=2,
             progress_reporter=reporter,
             checkpoint_score_attr="training_iteration",
             keep_checkpoints_num=1,
             name=experience,
             checkpoint_at_end=True,
             local_dir="./ray_results",
             log_to_file=True, 
             resources_per_trial={"cpu": args.cpu, "gpu": args.gpu},
             resume="AUTO", 
             sync_config=tune.SyncConfig(syncer=None),
         )

Here, syncer is set to None as I’m using this code within SLURM cluster manager.

I’m using a function API (training_func) where I train over X number of epochs (and not an infinity loop). I use session.report and checkpointing at the end of each epoch (for registering info inside tensorboardX).

From my understanding, it seems that because I only train over X epochs, a trial is considered done when it reach the last epoch and it exit the training_func. When all trials (here 2) are finished, the run is finished and it stops tuning.
Is that correct ?

If so, is there any way to continue tuning (by maybe re-setting trials periodically) and only stop it after some conditions are met (with stop option) ? Should I use more trials (increase num_samples) ? If the solution is to use an infinity loop instead inside my training_func (with something like while True), is there a way to prevent trials to reach past a certain epoch (I need to compare results at a defined epoch) ? Or to retrieve best trial for a precise epoch ?

Thanks for your insight !

Please, ignore this post. It’s a duplicates of Question - About tune stopping condition with PBT