How to set directory where checkpoints are saved

I run the following Tuner:

os.environ["TUNE_MAX_PENDING_TRIALS_PG"] = "1"
os.environ["TUNE_DISABLE_AUTO_CALLBACK_LOGGERS"] = "1"
os.environ["TUNE_RESULT_DIR"] = dirname_

tuner = tune.Tuner(
            tune.with_resources(
                tune.with_parameters(train, X_original=X_original, y=y),
                resources={"cpu": 10, "gpu": gpus_per_trial}
            ),
            tune_config=tune.TuneConfig(
                metric="loss",
                mode="min",
                scheduler=scheduler,
                num_samples=num_samples,
            ),
            # run_config=run_config_,
            param_space=config,
        )

Where my train function has the following code within:

train():
... Define model & optimizer, etc.
# Load existing checkpoint through `get_checkpoint()` API.
    if train.get_checkpoint():
        loaded_checkpoint = train.get_checkpoint()
        with loaded_checkpoint.as_directory() as loaded_checkpoint_dir:
            model_state, optimizer_state = torch.load(
                os.path.join(loaded_checkpoint_dir, "checkpoint.pt")
            )
            net.load_state_dict(model_state)
            optimizer.load_state_dict(optimizer_state)

Epoch...

with tempfile.TemporaryDirectory() as temp_checkpoint_dir:
        # temp_checkpoint_dir = "F:/rayCheckpoint"
            path = os.path.join(temp_checkpoint_dir, "checkpoint.pt")
            torch.save(
                (net.state_dict(), optimizer.state_dict()), path
            )
            checkpoint = Checkpoint.from_directory(temp_checkpoint_dir)
            train.report(
                {"loss": (val_loss / val_steps), "accuracy": (correct / total)},
                checkpoint=checkpoint,
            )

But I have limited memory on my laptop and I want to save the checkpoints in a separate disk (“F:/rayCheckpoint”) instead of the custom file generated in the AppData/temp. If I use a runconfig in the tuner I’m not able to get the metrics from the checkpoint. Can anybody help me understand?

Could you explain more by this?

Setting RunConfig(path) will determine where the final checkpoints can be found after training is finished. See How to Configure Persistent Storage in Ray Tune — Ray 2.8.1 for details.

Hi @matthewdeng , thanks for the reply!

I managed to understand the misunderstanding. Basically, I was setting the path of the Tuner on Runconfig (or “TUNE_RESULT_DIR” variable) but, afterwards, when selecting the best model and calling, e.g.:

results = tuner.fit()
best_result = results.get_best_result("loss", "min")
best_result.checkpoint.to_directory(), 

I was forgetting to include the path argument best_result.checkpoint.to_directory(path=dirname_) and, thus, I was creating several folders on Temp directories. With this I thought that Tuner was not selecting the best result based on the metrics but I when to the code of the .to_directory method I understood that it created the folder based on the Result class.

Let me also ask the train.report method and return do the same w.r.t reporting the metrics to the tuner. Do they overwrite each other?