Resume tuning after updating search space with more hyperparameters

wxie2013 · February 13, 2023, 2:41pm

I’m trying to resume the tuning after expanding the search space by adding more hyperparameters. The tune won’t run, I guess, due to the mismatch between the previous and current search space. Is there a way to continue the training in this case?

xwjiang2010 · February 13, 2023, 4:41pm

Hi,
This is not supported now.
Would it work if you just start a new tune run with each trial loading from previously last checkpoint?

wxie2013 · February 14, 2023, 3:51pm

resume tuning works nicely without changing search space. Is it possible to manually update some configuration files so that it can work with the modified search space?

xwjiang2010 · February 14, 2023, 3:53pm

It is currently not supported. Have you tried my last suggestion? You can start another tune run with new search space and each trial can either start from fresh or start from a loaded checkpoint (if the hyperparameter combination is ever in last tune run).

wxie2013 · February 14, 2023, 4:19pm

Can you please be a bit more specific? Currently I use algorithm.restore_from_dir() to resume the run. Is that what you are suggesting?

xwjiang2010 · February 14, 2023, 5:06pm

What algorithm do you use? Does it support resuming with updated hyperparameter space? Beyond the limitation of specific search algorithm, I don’t think ray tune supports that well either.
However, if you just want to continue tuning with a say selected sub hyperparameter space, without wasting all the previous effort, you can do something like the following:

1st run

def f(config):
  a = config.get("a")
  b = config.get("b")
  ...
  for i in range(10):
    accuracy = ...
    checkpoint = ...
    session.report({"acc": accuracy}, checkpoint=checkpoint)

tuner = Tuner(f, param_space={"a": tune.grid_search([1, 2, 3], "b": tune.grid_search([4, 5, 6]))})
result = tuner.fit()

2nd run

def get_latest_checkpoint_with_hyparam(checkpoint_dir, a, b):
  ...

checkpoint_dir = "s3://my_bucket/my_run/"
def f(config):
  a = config.get("a")
  b = config.get("b")
  latest_checkpoint_path = get_latest_checkpoint_with_hyparam(checkpoint_dir, a, b)
  if latest_checkpoint_path:
    checkpoint = Checkpoint.from_uri(latest_checkpoint_path)
    load_some_state_from_checkpoint(checkpoint)
  for i in range(10):
    ....

wxie2013 · February 14, 2023, 5:12pm

Thanks. I’m using hyperopt. I’m assuming it read the checkpoint using the function restore_from_dir() which is equivalent to “get_latest_checkpoint_with_hyparam” function. Is that a correct understanding?

xwjiang2010 · February 14, 2023, 5:47pm

I think get_latest_checkpoint_with_hyperparam is just so that to not waste the existing effort accumulated on a particular hyperparam combination.
If you are not concerned about that, and just want to resume searcher state, can you try something like the following in the 2nd run?

hyperopt_searcher = HyperOptSearch(...)
hyperopt_searcher.restore(...)
tuner = Tuner(function, param_space, tune_config=TuneConfig(search_alg=hyperopt_searcher))
tuner.fit()

?

wxie2013 · February 14, 2023, 8:04pm

That’s actually what I did for a different issue in the following thread:

xwjiang2010 · February 14, 2023, 8:40pm

Does that solve the issue for you?

wxie2013 · February 14, 2023, 10:09pm

Nope. This is actually not the right way to continue a finished tune. It can only work on the tune that’s not finished. The conclusion from that thread is to use algorithm.restore_from_dir() to continue a finished tune which I believe is the same as `get_latest_checkpoint_with_hyperparam(). In the current case, I can probably limit the search range based on the previous run or use the best result from the previous run as the initial parameter for the next round of tuning. Unfortunately, the initial parameter from hyperopt can’t be in the form of a list as reported in the following post:

  https://discuss.ray.io/t/initial-parameter-for-hyperopt/9370

I probably just move on with reduce search space based on the result of the previous tune.

xwjiang2010 · February 14, 2023, 11:24pm

yeah, I mean to ask whether this works for you (using restore_from_dir rather than Tuner.restore())

wxie2013 · February 15, 2023, 1:07pm

Oh. It won’t work either.

Topic		Replies	Views
Continue the tuning after changing the searching range	3	318	June 26, 2023
Continue training for successful ray tune candidates	3	842	October 7, 2022
Resuming tune optimization from previously explored configurations	2	851	October 3, 2023
Accessing param space after restore	3	273	July 16, 2023
Using ray tune as offline hyperparameter suggester	0	249	September 17, 2022

Resume tuning after updating search space with more hyperparameters

Related topics