Until 24/64 trials, tuner is normally executed.
But if it exceeds a certain number of trials, all trials raise error for RayOutOfMemoryError.
I use Trainable class, so I am using cleanup function to delete variables explicitly like below
def cleanup(self):
del self.runner
del self.model
del self.optimizer
del self.train_data_loaders
del self.valid_dataset
del self.valid_iter
del self.config
import gc; gc.collect()
torch.cuda.empty_cache()
And the number of simultaneous trials can be up to 4 since there are 4 gpus, and each process requires 1 gpu.
I would thank if you give any advices to troubleshoot.