I’m having trouble when trying to retrieve the trial best checkpoint, this is an example inside the training loop where I’m checkpointing my metrics
metrics = {"loss": val_loss, "running_loss": running_loss}
with tune.checkpoint_dir(epoch) as checkpoint_dir:
path = os.path.join(checkpoint_dir, "checkpoint")
torch.save((model, optimizer.state_dict()), path)
with open(path, "w") as f:
f.write(json.dumps(metrics))
tune.report(metrics)
when tryind result.get_best_checkpoint it returns None, what am I doing wrong?