Patience parameter for Ray tune?

Ray Tune was recommended to me by Nico Pinto (he was the first person to train NNs on GPUs, and taught Alex how to do it to set up AlexNet).

I am interested in Ray Tune early stopping (See “How does early termination (e.g. Hyperband/ASHA) work?” in Ray docs).

It appears you have a grace_period that sets the minimum number of epochs, but not a patience parameter (see ‘Early Stopping’ in Pytorch-Lightning documentation).

The patience parameter is very useful because most ML algorithms have jittery objectives. You don’t want to terminate if one single epoch increases the objective temporarily.

Is there a way to implement patience in Ray so I don’t have early stopping until convergence has finally been implemented? i.e. that Ray early terminates only if the objective doesn’t converge after a certain number of trials?

Unfortunately, this was an issue I had with Optuna (https://github.com/optuna/optuna/issues/1447) and that is one of the reasons I am considering Ray.

This might related to Tuning process with PBT is killed after a very small number of iterations (6/500))

Hey @turian!

ASHA is one of those “aggressive” early stopping algorithms. We’ve seen a lot of users ask for other stopping conditions like what you’ve asked, about stopping upon plateau or stopping upon deadline.

Recently, we (@krfricke) merged a new feature that should land in Ray 1.2.0 which has a couple stopping mechanisms out of the box.

You’re probably looking for the TrialPlateauStopper which has the following interface:

class TrialPlateauStopper(Stopper):
    """Early stop single trials when they reached a plateau.

    When the standard deviation of the `metric` result of a trial is
    below a threshold `std`, the trial plateaued and will be stopped
    early.

    Args:
        metric (str): Metric to check for convergence.
        std (float): Maximum metric standard deviation to decide if a
            trial plateaued. Defaults to 0.01.
        num_results (int): Number of results to consider for stdev
            calculation.
        grace_period (int): Minimum number of timesteps before a trial
            can be early stopped
        metric_threshold (Optional[float]):
            Minimum or maximum value the result has to exceed before it can
            be stopped early.
        mode (Optional[str]): If a `metric_threshold` argument has been
            passed, this must be one of [min, max]. Specifies if we optimize
            for a large metric (max) or a small metric (min). If max, the
            `metric_threshold` has to be exceeded, if min the value has to
            be lower than `metric_threshold` in order to early stop.

The grace_period and std are probably what you’re looking for.

Let me know if that works (or not) for you, and do feel free to follow up with any other questions!