Hi, I have a quick relevant question.
I am running the code:
result = tune.run(
tune.with_parameters(train),
resources_per_trial={"cpu": 12, "gpu": gpus_per_trial},
config=config,
num_samples=num_samples,
search_alg=hyperopt_search,
scheduler=scheduler,
progress_reporter=reporter,
keep_checkpoints_num = 1,
checkpoint_score_attr = "loss"
)
best_trial = result.get_best_trial("accuracy", "max", "last")
print("Best trial config: {}".format(best_trial.config))
print("Best trial final validation accuracy: {}".format(best_trial.last_result["accuracy"]))
best_trained_model = LeNet5().to(device)
best_checkpoint_dir = best_trial.checkpoint.value
model_state, optimizer_state = torch.load(os.path.join(best_checkpoint_dir, "checkpoint"))
best_trained_model.load_state_dict(model_state)
The code was running without throwing an error before but now I am getting this error:
best_checkpoint_dir = best_trial.checkpoint.value
AttributeError: '_TrackedCheckpoint' object has no attribute 'value'
It finds the best config and prints out but cannot get a checkpoint of the best config.
I would be so glad if you help with this issue, thanks =)
Hi,
Instead of doing best_trial.checkpoint.value
, can you try result.best_checkpoint
?
1 Like
Hi, thank you so much for such a quick return, I like the Ray Team
I solved the problem as you also suggested. I am sharing the code so that it can help other people if they have the same question.
best_checkpoint.to_directory(path=“directory”) creates folder named directory and saves the best checkpoint inside of it.
Then we can load that best checkpoint with torch.load(os.path.join(best_checkpoint_dir, “checkpoint”))
result = tune.run(tune.with_parameters(model.update_representation),
resources_per_trial={"cpu": 12, "gpu": 1},
config=config,
num_samples=2,
search_alg=hyperopt_search,
scheduler=scheduler,
keep_checkpoints_num=1,
checkpoint_score_attr="loss")
best_trial = result.get_best_trial("accuracy", "max", "last")
print("Best trial config: {}".format(best_trial.config))
print("Best trial final validation loss: {}".format(best_trial.last_result["loss"]))
print("Best trial final validation accuracy: {}".format(best_trial.last_result["accuracy"]))
best_checkpoint = result.get_best_checkpoint(trial=best_trial, metric="accuracy", mode="max")
best_checkpoint_dir = best_checkpoint.to_directory(path="directory")
model_state, optimizer_state = torch.load(os.path.join(best_checkpoint_dir, "checkpoint"))
best_trained_model = model._network
best_trained_model.load_state_dict(model_state)
model._network.load_state_dict(model_state)
1 Like