I’m doing hyperparameter optimization of a tensorflow model using ray.tune, a similar task to the one posted here:
I have resources={‘cpu’: 1, ‘gpu’:1} however the gpu memory is not being cleared after running one of the tasks. I have tried sleeping and wait_for_gpu but the memory never clears. Any tips on how to fix this?
from ray.tune.execution.placement_groups import PlacementGroupFactory
resources=PlacementGroupFactory([{"CPU": 1, "GPU": 1}])
Then pass the resources variable to tune.run instead of a dict.
Also, at the end of my objective function, I used torch.cuda.empty_cache() to clear GPU memory. I haven’t used tensorflow for a long time, but there must be an option to do that.
Sadly that didn’t work. I’ve also tried the only alternative for torch.cuda.empty_cache() in tensorflow which is to close the session and reset the graph I believe which didn’t work either.