Saving best model at the end of the training

J_J · February 5, 2021, 11:25am

Hi guys, I am trying to run hyperparamsearch using tune.with_parameters, because my data is too big. My training function takes parameters config, data and checkpoint_dir - I am currently saving the model after every trial. Is there a way, using tune.run and tune.with_parameters to save the best model on the disk after all the trials are run? Can I achieve it with checkpointing? My first idea was to use Trainable class like in this example: [tune] How to checkpoint best model · Issue #10290 · ray-project/ray · GitHub, but as far as I understand I can’t mix up Trainable class and tune.with_parameters. Moreover when I tried, it didn’t work.

my current solution code snippet, where config - pipeline hyperparameters (for transformers and models):

def train_model(config, data, checkpoint_dir=None):
    (train_data, y_train, dev_data, y_valid) = data
    model = ModelPipeline.from_config(config)

    model.fit(train_data, y_train, validation_data=(dev_data, y_valid))
    dev_metric_results = Evaluation(metrics=['custom_metrics']) \
        .evaluate(model=model, X=dev_data, y_true=y_valid)

    with tune.checkpoint_dir(step=0) as checkpoint_dir:
        model.save(checkpoint_dir)

    tune.report(custom_metrics=dev_metric_results['custom_metrics'])
        
analysis = tune.run(tune.with_parameters(train_model, data=data),
            name = name,
            config = config,
            num_samples=num_samples,
            time_budget_s = time_budget,
            verbose = verbose,
            resources_per_trial = resources,
            metric = 'custom_metrics',
            mode = 'max',
            keep_checkpoints_num=1,
            checkpoint_freq=1,
            checkpoint_score_attr='custom_metrics')

kai · February 5, 2021, 12:00pm

Hi, you can access the checkpoint of the best performing trial like this:

best_checkpoint_dir = analysis.best_checkpoint

For more information you can take a look here: Analysis (tune.analysis) — Ray v1.1.0

J_J · February 5, 2021, 12:32pm

Hi, thanks for the answer.My question isn’t about finding the path of the best checkpoint (which would still require checkpointing model after each trial), but to have only one model (the best one) saved at the disk, when all the trials are done.

xwjiang2010 · January 18, 2022, 8:33pm

I am afraid this is not currently supported. There is at least one checkpoint per trial.

Would it work if you have a separate customized process to monitor trial results and proactively delete checkpoints of trials less performant?

In the long run, it may be helpful for tune to provide something API to interact with ongoing experiment (deleting less performant trial checkpoints can be instrumented using such APIs.)

poro1301 · June 28, 2024, 1:59pm

May be the it is not supported but in my case, i save the model according to f1 using Checkpoint Config:

tuner = tune.Tuner(
    trainable_with_resources,
    param_space=search_space,
    tune_config=tune.TuneConfig(
        num_samples=1,
        mode='max',
        metric='eval_seq_f1'
        # scheduler=scheduler,
    ),
    run_config=RunConfig(
        name="tune_transformer_pbt",
        storage_path='/data-gpu/trungct/tmp',
        log_to_file=True,
        progress_reporter=reporter,
        checkpoint_config=CheckpointConfig(
            num_to_keep=2,
            checkpoint_score_attribute="eval_seq_f1",
            checkpoint_score_order='max',
        ),
    ),
)

And then the best checkpoint paths (defined in num_to_keep), … is available at:

tuner = Tuner(...)
tuner.fit()
results.get_best_result(scope='all').best_checkpoints

Topic		Replies	Views
Save model without checkpoint Ray Tune	0	395	October 28, 2021
Continue training for successful ray tune candidates	3	840	October 7, 2022
Save model parameters on each checkpoint Ray Tune	21	3334	March 29, 2023
Saving best checkpoint - tune is saving first iterations instead Ray Tune	1	496	October 18, 2021
How to save model during tuning Checkpointing, Restoring	0	335	January 8, 2024

Saving best model at the end of the training

Related topics