How to restore after crash

I’m fairly new to ray tune.

I was running a hyperparameter optimisation with ray tune. After hours and hours the system crashed and I am trying to recover the experiment from the ray_results file. I did not specify a name in the tune.run function so I do not know what to call. I am not sure what the best way to proceed is and if I can save what I have already done.

Any help will be appreciated.

Thanks.

try to check directory:
~\ray_results

If you created chekpoints and result path wasn’t changed, probably checkpints will be in this dir.

Peter

1 Like

Hey @Paul_V,

If you’re not using tune.grid_search, you should be able to call tune.run(..., resume=True with other arguments held constant as your previous tuning run.

If you are using tune.grid_search, maybe this PR that I’m working on right now will help solve your issue?

Curious, what was the reason for the crash?

Sadly function checkpointing was disabled. But I can see files in ~\ray_results. Do these contain information that I could recover?

@rliaw, when I tried to run with resume=True I got an error message telling me that no checkpoints exist.

@Paul_V If you don’t have directory with phrase “checkpoint”, I’m afraid You can’t recover checkpoint