[Ray Tune] Blocking for next trial

AhmedM · June 7, 2022, 2:03pm

Hello,

l use tune.run() to find the best width and depth of my network. However, some configurations lead to out of memory nd l get the following ray tune error:

trial_runner.py:1318 -- Blocking for next trial...
I have about 300 configurations to run. I would to continue running all the configurations even if we encounter configurations that lead to error. Is there any parameter in tune.run() to do that ?

Thank you
Capture d’écran 2022-06-07 à 15.54.35

Peter_Pirog · June 7, 2022, 3:27pm

@AhmedM at first check how much of free memory do you have ( both GPU and CPU memory).
in linux try htop and nvidia-smi commands.

Check maximum number of trials in the same time, if number is too big reduce it.
The problem is not number of configurations but the number of trials in the same time and size of the neural net (bigger neural net needs more memory to storage net weights.

Try if 3 trials work fine because 6 trials is too much.

AhmedM · June 7, 2022, 3:30pm

Hi @Peter_Pirog ,

I run the configurations sequentially, one by one. I set max_concurrent_trials=1,

What l would like to do is to allow ray tune to run the next configuration even if the previous one leads to an error.
Thanks

amogkam · June 8, 2022, 9:12pm

Hey @AhmedM! The default behavior for Tune is to still continue with the rest of the trials even if a trial finishes. Is your script failing or is it hanging after there are errors?

Do you mind also sharing what your call to tune.run looks like?

Topic		Replies	Views
How to continue errored out tune.run Ray Tune	1	949	April 4, 2022
Most runs immediately failing with "out of memory" Ray Tune	5	1263	May 11, 2021
RayTune gets stuck after completing all trials Ray Tune	1	702	February 11, 2022
How do I get tune.run to handle CUDA out of memory errors? Ray Tune	11	3304	December 9, 2020
My training is endless wth tune.run() Ray Tune	8	509	May 3, 2022

[Ray Tune] Blocking for next trial

Related topics