Correct way of using tuner.restore()

Here is a simple example of running ray.tune and resuming the tune after the 1st run. The result from the resumed tune return exactly the same score as the 1st run. I would expect it update the score for every re-running. Am I use the restore function incorrectly? Any help is appreciated.

  import os
  from ray import tune, air
  from hyperopt import hp
  from ray.tune.search.hyperopt import HyperOptSearch
  from ray.air import session
  from ray.air.checkpoint import Checkpoint

  # 1. Define an objective function.
  def objective(config):
      score = config["a"] ** 2 + config["b"]
      session.report({'SCORE':score})


  # 2. Define a search space.
  search_space = {
      "a": hp.uniform("a", 0, 1),
      "b": hp.uniform("b", 0, 1)
      }

  raw_log_dir = "ray_log"
  raw_log_name = "example"
  log_dir = os.path.join(os.getcwd(), raw_log_dir, raw_log_name)
  if os.path.exists(log_dir) == False:
      print('--- this is the 1st run ----')
      algorithm = HyperOptSearch(search_space, metric="SCORE", mode="max")
      tuner = tune.Tuner(objective,
              tune_config = tune.TuneConfig(
                  num_samples = 2, # number of tries. too expensive for Brian2
                  search_alg=algorithm,
                  ),
              param_space=search_space,
              run_config = air.RunConfig(local_dir = raw_log_dir, name = raw_log_name) # where to save the log which will be loaded later
              )
  else: #note: restoring described here doesn't work: https://docs.ray.io/en/latest/tune/tutorials/tune-stopping.html 
      print('--- previous run exist, continue the tuning ----')
      algorithm = HyperOptSearch(search_space, metric="SCORE", mode="max")
      tuner = tune.Tuner.restore(log_dir)

  results = tuner.fit()
  print(results.get_best_result(metric="SCORE", mode="max").config)

This is getting a bit frustrating. The Ray.Tune document on the restore feature seems to be either outdated or not implemented yet. A concrete example that actually working would be appreciated for new users.

Instead of using tunrer.restore() function, I used algorithm.restore_from_dir. Now the output results from every round of run is different but the result looks random, i.e. without improvement compared to previous run. The question is which is the right method to use. Here is a version of the code using algorithm.restore_from_dir:

  import os
  from ray import tune, air
  from hyperopt import hp
  from ray.tune.search.hyperopt import HyperOptSearch


  # 1. Define an objective function.
  def objective(config):
      score = config["a"] ** 2 + config["b"]
      #SCORE 1st apparence which defines the key of the dictionary, i.e. metric="SCORE", 
      # or  return {"SCORE": score}
      tune.report(SCORE=score)  # this or the following return should work
      #return {"SCORE": score}


  # 2. Define a search space.
  search_space = {
      "a": hp.uniform("a", 0, 1),
      "b": hp.uniform("b", 0, 1)
      }

  raw_log_dir = "./ray_log"
  raw_log_name = "example"

  algorithm = HyperOptSearch(search_space, metric="SCORE", mode="max")
  if os.path.exists(os.path.join(raw_log_dir, raw_log_name)) == False:
      print('--- this is the 1st run ----')
  else: #note: restoring described here doesn't work: https://docs.ray.io/en/latest/tune/tutorials/tune-stopping.html 
      print('--- previous run exist, continue the tuning ----')
      algorithm.restore_from_dir(os.path.join(raw_log_dir, raw_log_name))

  # 3. Start a Tune run and print the best result.
  trainable_with_resources = tune.with_resources(objective, {"cpu": 8})
  tuner = tune.Tuner(objective,
          tune_config = tune.TuneConfig(
              num_samples = 2, # number of tries. too expensive for Brian2
              search_alg=algorithm,
              ),
          param_space=search_space,
          run_config = air.RunConfig(local_dir = raw_log_dir, name = raw_log_name) # where to save the log which will be loaded later
          )

  results = tuner.fit()
  print(results.get_best_result(metric="SCORE", mode="max").config)

According to the source code comments, the restore() function is only for “”"“Restores Tuner after a previously failed run.”. If a previous run is not failed, then the restore() function will just printout the tuned result from the previous run.

Ah, looks like you already tried the method of restoring the search algorithm - that is the way I recommended in this thread: tuner.restore() won't make progress · Issue #30223 · ray-project/ray · GitHub.

Regarding your comment:

Now the output results from every round of run is different but the result looks random, i.e. without improvement compared to previous run.

You may need to increase the number of samples still, or consider changing the n_initial_points parameter that is passed into HyperOptSearch: see API reference here.

1 Like

A few questions I have about your experience trying Tuner.restore:

  1. What functionality did you expect from Tuner.restore, and what were the biggest gaps?
  2. Which part of the documentation seems outdated/not implemented? What could be added to the docs to make this less confusing?

In the following link:
Stopping and Resuming a Tune Run — Ray 2.1.0

it says “If you’ve stopped a run and and want to resume from where you left off, you can then call Tuner.restore() like this:”. Would be clearer to add “where you left off for unfinished trials, as well as giving your the option to restart or resume errored trials.” because one has the option to stop the run pragmatically

It would be nice that restore() can continue a finished run in case a user want to add more sample. In this case, restore() can just call, e.g. algorithm.restore_from_dir(). Then one can just use restore() for continued tuning in all conditions and no need to use a different restore function in different condition.