How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Here are snippets of my code:
Original tuner experiment =
from ray import tune
from ray.tune.search.basic_variant import BasicVariantGenerator
algo = BasicVariantGenerator(max_concurrent=2, random_state=0)
search_space = {"n_gram_size": tune.grid_search([2, 3]),
"vector_size": tune.grid_search([300, 1200, 2400]),
"fasttext_window": tune.grid_search([4, 8]),
"train_epoch": tune.grid_search([5, 10]),
"char_ngrams_length": tune.grid_search(list(conditional_max_n_and_min_n()))}
import numpy as np
from ray import air
# For non-grid-search tuning, please state the spec_num_samples = 10
tuner = tune.Tuner(
fasttext_tuning_func,
tune_config = tune.TuneConfig(metric="weighted macro f1-score",
mode="max",
search_alg=algo,
trial_dirname_creator=lambda trial: trial.trainable_name + "_" + trial.trial_id),
run_config=air.RunConfig(name="fasttext tuning",
local_dir=ipynb_path+'/fasttext tuning files',
verbose=1),
param_space = search_space)
Restoring tuner experiment =
from ray import tune
results_experiment_path = ipynb_path+'/fasttext tuning files/fasttext tuning'
search_space = {"n_gram_size": tune.grid_search([2, 3]),
"vector_size": tune.grid_search([300, 1200, 2400]),
"fasttext_window": tune.grid_search([4, 8]),
"train_epoch": tune.grid_search([5, 10]),
"char_ngrams_length": tune.grid_search(list(conditional_max_n_and_min_n()))}
tuner = tune.Tuner.restore(results_experiment_path, trainable = fasttext_tuning_func,
param_space = search_space,
resume_unfinished = True,
resume_errored = True)
I was hyperparameter tuning a word embedding model for my use case and I would like to continue running the experiment. However, when I try restoring the previous checkpoint, the following warnings are issued =
Basically, the tuner can’t find the previous experiment checkpoint and has created a new experiment. How can I mitigate this so the tuner can recognize the previous experiment checkpoint and continue doing the previous experiment (rather than creating a new one)?