tune.Tuner.restore bug?

Hi

Ray 2.1.0

It seems to me that when trying to restore a failed run utilizing a scheduler then using something like this:

tuner = tune.Tuner.restore(“…”)
results = tuner.fit()

fails.

When using num_samples bigger than the resources available the previously paused trials don’t restart when it should be their turn nor does the scheduler seem to be reactivated (here PB2).

Output of initial fit():

Above you can see the PBT algo (here PB2) running with checkpoints and perturbs.
However, when trying to restore a failed run (stopped by entering ctrl c) this seems to fail with output looking like this:

and only the running trial from before is restarted - not the ones that were paused when the “fail” occurred even though way more times steps were preformed beyond the paused ones.

Sample code for running this can be found here

BR

Jorgen

Hey @Jorgen_Svane,

Thanks for the detailed summary! This brings up two issues that need to be fixed:

  1. Schedulers are not loaded back correctly on restoration when using Tuner.restore(). The restored experiment defaults to the FIFOScheduler as you can see from the status log. The FIFO scheduler doesn’t handle paused trials, which is why you only see the running trial making progress. This will happen if you use any scheduler - not just PBT.
  2. Most schedulers such as PBT/PB2 don’t have save/restore functionality implemented. Will be looking into this and keep you updated.

I’ve opened up an issue on github here: [Tune] `Tuner.restore` doesn't restore schedulers properly · Issue #30838 · ray-project/ray · GitHub.