Tuning process with PBT is killed after a very small number of iterations (6/500))

LucaCappelletti94 · December 14, 2020, 9:27am

Hello,

I understand that this question will be vague, and it is mainly though to my ignorance on the PBT mechanism.

I am trying to tune a CNN model on genomic sequence data on a relatively easy task: the first model I came up with in 5 minutes achieved a 0.92 AUPRC and 0.94 AUROC validation scores

After having established that the task is relatively easy (hence most models will achieve a decent result), I wanted to try the PBT method to tune the meta-model to learn how to use PBT.

Having defined the parameters as the number of filters/kernel size plus some activation regularization weight, I have tried to use the PBT as follows:

First I have defined a space of parameters using uniform lambdas as shown in the tutorial here. Here I use double lambdas just to capture the local variables defining the range.

space = {
    key: (lambda vr: lambda: np.random.uniform(*vr))(val_range)
    for key, val_range in model.space().items()
}

Then I define the train method as follows:

def train_convnet(config):
    import silence_tensorflow.auto
    window_size = 256
    train, test = create_training_sequence(window_size)
    meta_model: Model = build_model(window_size)
    meta_model.space()
    model = meta_model.build(**config)
    model.compile(
        optimizer='nadam',
        loss="binary_crossentropy",
        metrics=[
            "accuracy",
            AUC(curve="PR", name="auprc"),
            AUC(curve="ROC", name="auroc")
        ]
    )
    model.fit(
        train,
        validation_data=test,
        epochs=1000,
        verbose=False,
        callbacks=[
            TuneReportCallback(metrics="val_auprc"),
            EarlyStopping(monitor="auprc", patience=5)
        ]
    )

And finally I call the tuning process:

from ray.tune.stopper import EarlyStopping as TuneEarlyStopping

scheduler = PopulationBasedTraining(
    time_attr="training_iteration",
    perturbation_interval=5,
    hyperparam_mutations=space
)

analysis = tune.run(
    train_convnet,
    name="pbt_test",
    scheduler=scheduler,
    metric="val_auprc",
    mode="max",
    verbose=1,
    stop=TuneEarlyStopping("val_auprc", ),
    resources_per_trial={
        "cpu": cpu_count()//4,
        "gpu": 1
    },
    num_samples=500
)

The tuning processes are then all terminated and they achieve at most an AUPRC of 0.45.

What am I doing wrong? What information is needed to properly resolve this issue?

Though to the reserved nature of the training labels, I cannot share an example of the dataset but I believe that the issue at hand has little to do with the considered task and more to do with how I am using tune and PBT.

Thank you,
Luca

rliaw · December 15, 2020, 8:22am

Can you try removing the EarlyStopping parameters?

LucaCappelletti94 · December 15, 2020, 8:24am

Which of the Early Stopping ones? The ones within the Keras’s Early Stopping?

rliaw · December 15, 2020, 8:27am

Can you try removing both at first?

LucaCappelletti94 · December 15, 2020, 8:29am

I am now trying to run the experiment with neither of them. I am wondering if I am not passing the mode of optimization somewhere, and maybe I am seeing always the minima of the AUPRC being reported instead of the max.

LucaCappelletti94 · December 15, 2020, 8:37am

So far I am seeing all the models converging to the very same value of val_auprc, with extraordinary precision. I don’t think overfitting of the model is an issue, seeing how easy the task at hand is.

If you’d like to have a call to see first hand the complete notebook please do let me know on Slack.

Thanks!

rliaw · December 15, 2020, 11:05pm

Hmm, can you set verbose=3 for tune.run to see which metrics are actually being updated?

LucaCappelletti94 · January 13, 2021, 8:25am

Sorry for the delay, got sidetracked with another project.
Apparently, the space of hyper-parameters was too vast and the BO, even with 100 initial random steps.
If I significantly restrict the hyper-parameters space the performance increase, but they don’t achieve the performance of the quickly hand-picked model even with over 600 iterations.
I don’t understand why is this happening.

LucaCappelletti94 · January 13, 2021, 8:26am

The early stopping class from tune is what still kills the process after the patience number of iterations (even with the restricted hyper-parameters space). I guess there is something wrong in there, as I see that the performance are getting better.

Topic		Replies	Views
[Tune PBT] Population Based Training :: Questions & Errors Ray Tune	3	1182	April 1, 2021
Ray Tune PBT - Structural Hyperparameters Ray Tune	1	18	November 15, 2024
PBT Replay with RLlib	0	189	September 6, 2023
PBT tune Question Ray Tune	3	411	April 1, 2022
Metric for PBT in Ray 2.40 Ray Tune	1	71	January 28, 2025

Tuning process with PBT is killed after a very small number of iterations (6/500))

Related topics