Then how can I deal with the problem of “GPU memory not released by previous worker” that max_calls=1 is designed to address? In my current tune experiment where each trial uses 1 GPU, half of the trials always end up with CUDA Out of Memory error. I suspect it is because these trials were started too soon, while the memory claimed by previous worker was not released.