l use tune.run() to find the best width and depth of my network. However, some configurations lead to out of memory nd l get the following ray tune error:
trial_runner.py:1318 -- Blocking for next trial...
I have about 300 configurations to run. I would to continue running all the configurations even if we encounter configurations that lead to error. Is there any parameter in tune.run() to do that ?
@AhmedM at first check how much of free memory do you have ( both GPU and CPU memory).
in linux try htop and nvidia-smi commands.
Check maximum number of trials in the same time, if number is too big reduce it.
The problem is not number of configurations but the number of trials in the same time and size of the neural net (bigger neural net needs more memory to storage net weights.
Try if 3 trials work fine because 6 trials is too much.
Hey @AhmedM! The default behavior for Tune is to still continue with the rest of the trials even if a trial finishes. Is your script failing or is it hanging after there are errors?
Do you mind also sharing what your call to tune.run looks like?