How to stop experiment when max ERROR trials exceeds, eg, >=3

metaphor · December 21, 2021, 2:22am

If my trainable has a bug, it will consistently run all trials. result in all ERROR trials.
I would like to stop if max allowed ERROR trails exceeds.
There is a Tune.run(fail_fast) parameter, which stops experiment on first error, which is not what I want.
Could you please suggest how to do it?

matthewdeng · December 21, 2021, 9:00am

Hey @metaphor, could you explain the use-case a little more? Would it be reasonable to try running your Tune job with a smaller number of samples first to verify that there are no bugs before proceeding with a full sweep?

metaphor · December 21, 2021, 9:21am

it’s a production auto machine learning platform based on Ray. we won’t be able to know ‘bugs’ in advances since user inputs varies.
It could run 1000+ trials in some cases. We just want a way to stop experiment in case of continuous ERROR trials. Don’t let the user to wait for too long.

matthewdeng · December 21, 2021, 5:37pm

Gotcha, that makes sense!

I took a look and saw that our current Stopper API doesn’t quite have access to this information. I’ve created a Github issue to track this here: [Feature][Tune] Trial status based Stopper · Issue #21222 · ray-project/ray · GitHub

metaphor · December 22, 2021, 1:19am

thanks. i will follow it.

Meanwhile, the workaround could be wrapping the Searcher(much like ConcurrencyLimiter), which has lifecycle callback on_trial_complete(error: bool = False) , count the Error, and stop suggesting by return Searcher.FINISHED if tolerance exceeds.

Topic		Replies	Views
[tune] Using an experiment-wide Stopper sometimes terminates prematurely Ray Tune	8	536	June 1, 2023
Stop experiment, but finish currently running trials Ray Tune	7	424	February 21, 2023
Python ray tune unable to stop trial or experiment Ray Tune	5	1001	February 19, 2022
Cancel trial (but not experiment) Ray Tune stopping condition & comparisons	0	10	January 20, 2025
How to force ray tune to shutdown from inside to continue experiment later Ray Tune	1	416	February 19, 2022

How to stop experiment when max ERROR trials exceeds, eg, >=3

Related topics