Hello,
l use tune.run()
to find the best width and depth of my network. However, some configurations lead to out of memory
nd l get the following ray tune error:
trial_runner.py:1318 -- Blocking for next trial...
I have about 300 configurations to run. I would to continue running all the configurations even if we encounter configurations that lead to error. Is there any parameter in tune.run() to do that ?
Thank you
@AhmedM at first check how much of free memory do you have ( both GPU and CPU memory).
in linux try htop and nvidia-smi commands.
Check maximum number of trials in the same time, if number is too big reduce it.
The problem is not number of configurations but the number of trials in the same time and size of the neural net (bigger neural net needs more memory to storage net weights.
Try if 3 trials work fine because 6 trials is too much.
Hi @Peter_Pirog ,
I run the configurations sequentially, one by one. I set max_concurrent_trials=1,
What l would like to do is to allow ray tune to run the next configuration even if the previous one leads to an error.
Thanks
Hey @AhmedM! The default behavior for Tune is to still continue with the rest of the trials even if a trial finishes. Is your script failing or is it hanging after there are errors?
Do you mind also sharing what your call to tune.run
looks like?