Resume tuning after updating search space with more hyperparameters

I’m trying to resume the tuning after expanding the search space by adding more hyperparameters. The tune won’t run, I guess, due to the mismatch between the previous and current search space. Is there a way to continue the training in this case?

Hi,
This is not supported now.
Would it work if you just start a new tune run with each trial loading from previously last checkpoint?

resume tuning works nicely without changing search space. Is it possible to manually update some configuration files so that it can work with the modified search space?

It is currently not supported. Have you tried my last suggestion? You can start another tune run with new search space and each trial can either start from fresh or start from a loaded checkpoint (if the hyperparameter combination is ever in last tune run).

Can you please be a bit more specific? Currently I use algorithm.restore_from_dir() to resume the run. Is that what you are suggesting?

What algorithm do you use? Does it support resuming with updated hyperparameter space? Beyond the limitation of specific search algorithm, I don’t think ray tune supports that well either.
However, if you just want to continue tuning with a say selected sub hyperparameter space, without wasting all the previous effort, you can do something like the following:

1st run

def f(config):
  a = config.get("a")
  b = config.get("b")
  ...
  for i in range(10):
    accuracy = ...
    checkpoint = ...
    session.report({"acc": accuracy}, checkpoint=checkpoint)

tuner = Tuner(f, param_space={"a": tune.grid_search([1, 2, 3], "b": tune.grid_search([4, 5, 6]))})
result = tuner.fit()

2nd run

def get_latest_checkpoint_with_hyparam(checkpoint_dir, a, b):
  ...

checkpoint_dir = "s3://my_bucket/my_run/"
def f(config):
  a = config.get("a")
  b = config.get("b")
  latest_checkpoint_path = get_latest_checkpoint_with_hyparam(checkpoint_dir, a, b)
  if latest_checkpoint_path:
    checkpoint = Checkpoint.from_uri(latest_checkpoint_path)
    load_some_state_from_checkpoint(checkpoint)
  for i in range(10):
    ....

Thanks. I’m using hyperopt. I’m assuming it read the checkpoint using the function restore_from_dir() which is equivalent to “get_latest_checkpoint_with_hyparam” function. Is that a correct understanding?

I think get_latest_checkpoint_with_hyperparam is just so that to not waste the existing effort accumulated on a particular hyperparam combination.
If you are not concerned about that, and just want to resume searcher state, can you try something like the following in the 2nd run?

hyperopt_searcher = HyperOptSearch(...)
hyperopt_searcher.restore(...)
tuner = Tuner(function, param_space, tune_config=TuneConfig(search_alg=hyperopt_searcher))
tuner.fit()

?

That’s actually what I did for a different issue in the following thread:

Does that solve the issue for you?

Nope. This is actually not the right way to continue a finished tune. It can only work on the tune that’s not finished. The conclusion from that thread is to use algorithm.restore_from_dir() to continue a finished tune which I believe is the same as `get_latest_checkpoint_with_hyperparam(). In the current case, I can probably limit the search range based on the previous run or use the best result from the previous run as the initial parameter for the next round of tuning. Unfortunately, the initial parameter from hyperopt can’t be in the form of a list as reported in the following post:

  https://discuss.ray.io/t/initial-parameter-for-hyperopt/9370

I probably just move on with reduce search space based on the result of the previous tune.

yeah, I mean to ask whether this works for you (using restore_from_dir rather than Tuner.restore())

Oh. It won’t work either.