Ray Tune was recommended to me by Nico Pinto (he was the first person to train NNs on GPUs, and taught Alex how to do it to set up AlexNet).
I am interested in Ray Tune early stopping (See “How does early termination (e.g. Hyperband/ASHA) work?” in Ray docs).
It appears you have a grace_period that sets the minimum number of epochs, but not a patience parameter (see ‘Early Stopping’ in Pytorch-Lightning documentation).
The patience parameter is very useful because most ML algorithms have jittery objectives. You don’t want to terminate if one single epoch increases the objective temporarily.
Is there a way to implement patience in Ray so I don’t have early stopping until convergence has finally been implemented? i.e. that Ray early terminates only if the objective doesn’t converge after a certain number of trials?
ASHA is one of those “aggressive” early stopping algorithms. We’ve seen a lot of users ask for other stopping conditions like what you’ve asked, about stopping upon plateau or stopping upon deadline.
Recently, we (@krfricke) merged a new feature that should land in Ray 1.2.0 which has a couple stopping mechanisms out of the box.
You’re probably looking for the TrialPlateauStopper which has the following interface:
class TrialPlateauStopper(Stopper):
"""Early stop single trials when they reached a plateau.
When the standard deviation of the `metric` result of a trial is
below a threshold `std`, the trial plateaued and will be stopped
early.
Args:
metric (str): Metric to check for convergence.
std (float): Maximum metric standard deviation to decide if a
trial plateaued. Defaults to 0.01.
num_results (int): Number of results to consider for stdev
calculation.
grace_period (int): Minimum number of timesteps before a trial
can be early stopped
metric_threshold (Optional[float]):
Minimum or maximum value the result has to exceed before it can
be stopped early.
mode (Optional[str]): If a `metric_threshold` argument has been
passed, this must be one of [min, max]. Specifies if we optimize
for a large metric (max) or a small metric (min). If max, the
`metric_threshold` has to be exceeded, if min the value has to
be lower than `metric_threshold` in order to early stop.
The grace_period and std are probably what you’re looking for.
Let me know if that works (or not) for you, and do feel free to follow up with any other questions!