How to stop ray tune running out of space on disk

I have this code:

    search_alg = HyperOptSearch()
    
    hyperopt_search = HyperOptSearch(
        metric="val_acc", mode="max")
    
    tuner = tune.Tuner(tune.with_resources(train_fn, {"cpu": 1}), tune_config=tune.TuneConfig(num_samples=100,search_alg=hyperopt_search),param_space=config_dict,run_config= RunConfig(local_dir='/home/pytorch_test/am/runs/'))
    results = tuner.fit()
    best_result = results.get_best_result(metric="val_acc", mode="max")

where train_fn() is a pytorch lightning model of a neural network.

The method runs fine, but because ray tune is saving the output from every single run, I cannot complete a large run without running out of space on disk (smaller runs complete without error).

Is there a parameter I can use in this code that doesn’t save the full data for all ray tune runs? At the end, I only want the data for the best run according to the metric I’ve selected - could someone explain how to do this/how to take up less space?

can you be more specific about what data you don’t want?
Is it the result files?
the ones under /home/pytorch_test/am/runs/?

How often are you checkpointing?

Thanks,

I checkpoint here:

    tune_report_callback = TuneReportCheckpointCallback(
    metrics={
        "val_loss": "val_loss",
        "val_acc": "val_acc",
    },

        filename="antimicrobial_ray_ckpt",
        on="validation_end",
    )


    trainer = pl.Trainer(default_root_dir=root_dir,
                         callbacks=[ModelCheckpoint(save_weights_only=False, mode="max", monitor="val_acc"),
                                    tune_report_callback,
                                    EarlyStopping(monitor="val_acc", mode="max", patience=patience_num)],

                         max_epochs=max_epochs_num,
                         logger=csv_logger,
                         #accelerator='gpu',
                         #devices=-1
                         )

so I guess I do a checkpoint for each validation? (I’m not sure how to tell how often I checkpoint).

But yes I guess it’s the /runs/ folder I want to minimize to only keep the best runs (or to be honest, I’m not a pytorch expert, so whatever solution makes most sense).

Is there a way to do this?

Thanks

can you try instead on="fit_end"?