[tune] Using an experiment-wide Stopper sometimes terminates prematurely

chawins · January 5, 2021, 4:35am

Hello, I have been using Tune (1.2.0dev) without any problem, but recently, I had some problems with experiment-wide stoppers like ExperimentPlateauStopper and my own custom stopper. Sometimes, the experiment exited before it’s supposed to (stopping condition was clearly not met), and when this happened, it output the following warning: Skipping cleanup - trainable.stop did not return in time. Consider making stop a faster operation. Does anyone know what causes this and/or how to fix it? Thanks!

A bit more context: each trial trains a neural network using Pytorch, and tune.report is called only once at the end of each training. This problem happens with both BayesOptSearch and HyperOptSearch. I also use ConcurrencyLimiter.

rliaw · January 5, 2021, 4:45am

Hey @chawins thanks for making this issue!

Could you post a full reproducible script for us to take a look at?

Thanks!

chawins · January 5, 2021, 5:28am

Thanks, @rliaw. The whole thing is a pretty large code base, but I will see what I can do.
I’m also running the stopper on version 1.1.0 to see if the problem persists.

zmin1217 · April 14, 2023, 7:06am

Hello, @chawins I have also encountered such a problem, have you fixed it? Thanks.

justinvyu · April 27, 2023, 6:21pm

Hi @zmin1217,

Do you have a simple repro script? If so, could you create a Github issue with that script attached?

zmin1217 · May 4, 2023, 12:25pm

@justinvyu There are no simple scripts, and i have already set TUNE_FORCE_TRIAL_CLEANUP_S=1 to temporarily fix it, which will forcibly cleanup by terminating actors.

justinvyu · May 8, 2023, 9:16pm

Hi @zmin1217,

What’s the exact error you are seeing? Is it this:

Skipping cleanup - trainable.stop did not return in time.

What version of Ray are you on?

zmin1217 · May 16, 2023, 6:00am

@justinvyu , yes , it repeatedly outputs
2023-04-13 09:08:08,022 WARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making stop a faster operation. and the version of Ray is 1.8.0

justinvyu · June 1, 2023, 9:30pm

Could you try upgrading to the latest Ray version? It’s hard to give a fix for the old version, and this error message is no longer emitted by the current version of Ray.

Topic		Replies	Views
Exiting before cleanup is executed Ray Tune	2	330	February 28, 2021
Python ray tune unable to stop trial or experiment Ray Tune	5	1009	February 19, 2022
Trainable.stop did not return in time.Consider making `stop` a faster operation Ray Core	0	231	April 13, 2023
SystemExit error in Tune experiment Ray Tune	3	337	May 7, 2021
How to stop experiment when max ERROR trials exceeds, eg, >=3 Ray Tune	4	403	December 22, 2021

[tune] Using an experiment-wide Stopper sometimes terminates prematurely

Related topics