I am trying to make ray tune with wandb stop the experiment under certain conditions. Ray tune just runs indefinitly, not honoring any of my stopping conditions.
I am using ray 1.10.0
stop all experiment if any trial raises an Exception (so i can fix the code and resume)
stop if my score gets -999
stop if the variable varcannotbezero gets 0
The following things i tried all failed in achieving desired behavior:
stop={“score”:-999,“varcannotbezero”:0}
max_failures=0
defining a Stoper class did also not work
class RayStopper(Stopper):
def __init__(self):
self._start = time.time()
#self._deadline = 300
def __call__(self, trial_id, result):
self.score=result["score"]
self.varcannotbezero=result["varcannotbezero"]
return False
def stop_all(self):
if self.score==-999 or self.varcannotbezero==0:
return True
else:
return False
my optimization functions calls a script in the background, i am wrapping tune around it.
@amogkam : I tried raising Errors inside of myscript() and inside of tune_obj, but neither stops the experiment
def evaluation_fn(config):
paramDict={k:v for k,v in config.items() if k.startswith("HP_")}
Trial=str(uuid.uuid4().int>>64)[0:16]
# run script
Result=myscript(Project,Trial,local_dir,window)
return Result, Trial
def tune_obj(config,checkpoint_dir=checkpoint_dir):
Result, Trial = evaluation_fn(config)
if len(Result)==0:
tune.report(score=-999.0,Backtest=Backtest,varcannotbezero=0)
raise TuneError("ERROR: Result empty")
# raise Exception("ERROR: Result empty")
else:
Result={f"Lower{k}":v[0] for k,v in Result.items()}
tune.report(score=Result['score'],varcannotbezero=1,**Result)
hello @amogkam, just wondering if you know how I can resume an experiment after it met the Stopper conditions.
Ideally i want it to stop, then expect my experiment, but then be able to continue from the stopping point, so that the hyperopt optimizer doesnt start from 0.
I tried to manually delete the tune_obj_e65e1e75 folder with the stopped trial, but when i resume the experiment, it just says “finished”
Inside your training function, if any of the stopping criteria is met, then raise an error.
Specify fail_fast=True in tune.run. This will stop the entire experiment when any trial raises an error (and therefore when any trial reaches the stopping condition).
Then you can resume the experiment since it would be in an error state and not have finished yet.