Tunning exits without executing all trials

Despite setting num_samples to 5 hyperparameter search exits only after 2 trials. If I increase num_samples to say 50, it exits after around 30.

Could someone please explain the behaviour?

I’m using PopulationBasedTraining scheduler and EarlyStopping as:
If I understand EarlyStopping correctly, then it should prevent individual trials from executing all epochs, rather than stop executing the entire experiment.

scheduler = tune.schedulers.PopulationBasedTraining(
        time_attr='time_total_s',
        perturbation_interval=600.0,
        hyperparam_mutations=hyper_config)

analysis = tune.run(
        train,
        metric=hyper_metric,
        mode=hyper_mode,
        verbose=1,
        config=hyper_config,
        resources_per_trial={
            "cpu": 2,
            "gpu": 1},
        num_samples=5,
        stop=tune.stopper.EarlyStopping('epoch_dev_loss', top=10, mode='min', patience=4),
        scheduler=scheduler,
        keep_checkpoints_num=1,
        fail_fast=True)

Hi @false,

EarlyStopping is a (deprecated) alias for ExperimentPlateauStopper. This should be evident in the warning that you should see when running the experiment:

“The EarlyStopping stopper has been renamed to ExperimentPlateauStopper. The reference will be removed in a future version of Ray. Please use ExperimentPlateauStopper instead.”

And as you can see in the documentation, it does indeed stop the entire experiment when the metrics stop improving.

What you’re looking for is probably the TrialPlateauStopper, or alternatively using the ASHAScheduler (instead of population based training).

Please note that you should think about if this is really the right thing to do in your case. In population based training, bad performing trials exploit other trials. There’s likely no benefit in early stopping, as this will just trigger new trials to start. These new trials usually perform worse than existing well performing trials (not the least because the latter ones trained for longer) and will end up exploiting these well performing trials anyway.

Thank you for the explanation @kai. ExperimentPlateauStopper is definitely what I want.

There’s likely no benefit in early stopping, as this will just trigger new trials to start

Could you please elaborate on this (with respect ot TrialPlateauStopper)?

As far as I understand, this is desirable behaviour, since running a trial that has hit a plateau for more epochs is unlikely to result in better performance. Therefore, I’m struggling to understand why would running a trial with the same hyperparameters for more epochs produce any benefit for PBT since if I stop it earlier PBT already has the information on how the given hparams will perform and the scheduler can perform the same actions as it would after the full number of epochs.

So my explanation in the last post actually approaches this problem from the wrong side, i.e. when early stopping bad performing trials. But you’re early stopping well performing trials, which makes much more sense.

Still, you could run into problems here because terminated trials cannot be exploited by other trials anymore. It may be that your best performing trial plateaued, but more improvement would be possible if it you continued to train it with different hyperparameters.

It may be that your best performing trial plateaued, but more improvement would be possible if it you continued to train it with different hyperparameters.

Here lies my confusion. I thought that a single trial corresponds to a single set of hyperparameter values, as it would make little sense to change e.g. optimizer’s configs between epochs?

Well this is exactly what PopulationBasedTraining does: It changes hyperparameters during the course of the run. Thus you end up with a hyperparameter schedule instead of a single configuration.

This blog post by Deepmind explains PBT pretty well: Population based training of neural networks | DeepMind

1 Like

Thank you for the link, I’ll make sure to give it a read.

On a side note, would you have any idea why a parameter value passed to my experiment could be outside of the specified distribution?

In configs I’ve specified beta1 (i.e. the first value in Adam beta pair) to be sampled in the range [0.89, 0.99] as follows:

config = {
    "beta1": tune.uniform(0.89, 0.99),
    ...}

But the actual value in one of the trials is 1.081497759889524 (according to params.json for the trial and a pytorch error).

The directory name for the trial states beta1=0.91005,....

Yes, this is also an effect of population based training.

In a gist, per default the bottom 25% of the trials are stopped and exploit the latest checkpoint from one of the trials in the first 25%. This means they restore their latest checkpoint and copy their hyperparameters. Because it doesn’t make much sense to train two trials with the same parameters, PBT mutates these hyperparameters. This means that with a certain probability the parameters will be slightly altered.

Per vanlla PBT, mutating a hyperparameter will multiply its value with 0.8 or 1.2. In your case, 0.91005... * 1.2 ~= 1.08149.... This all will become a lot clearer once you read the blog post or the original paper.

If you strictly want to keep the parameters in bounds, either use a categorical choice, or use a custom_explore_fn that keeps the parameters in bounds (see also the docs)

1 Like

Thank you for clarifying.