Adding memory in resources_per_trial in tune.run() hangs

raytune_kuberay_user · September 20, 2022, 9:48pm

Hi, I am using tune.run() to do hyperparameter tuning.

I noticed that, when I pass
resources_per_trial =
{“cpu” : 4, “gpu”: 1, } → this will work.

However, when I added memory, it hangs
resources_per_trial =
{“cpu” : 4, “gpu”: 1, “memory”: 1024*1024} memory’s unit is in bytes, I believe.

I have 16gb memory allocated for ray cluster so it should be enough. Any one know why it hangs

We are using ray 1.12.

amogkam · September 21, 2022, 3:58am

Hey @raytune_kuberay_user- a very simple workload is able to run successfully for me:

import ray
from ray import tune

def train_fn(args):
    return 1


ray.init()
print(ray.available_resources())
tune.run(train_fn, config={}, resources_per_trial={"cpu": 1, "memory": 1024*1024},
         num_samples=4)

my hunch is that this is not something on the Ray Tune side, but rather on the cluster/kuberay deployment.

Can you do a print(ray.available_resources()) and share the output? You should see memory included in the output, so something like this: {'node:127.0.0.1': 1.0, 'CPU': 16.0, 'memory': 3314275124.0, 'object_store_memory': 1657137561.0}

raytune_kuberay_user · October 28, 2022, 10:08pm

Thanks. We later realized that this is a bug that is fixed in 1.13.0 [tune] Fix memory resources for head bundle #23861

Topic		Replies	Views
Most runs immediately failing with "out of memory" Ray Tune	5	1229	May 11, 2021
Ray Resources Per Trial Ray Tune	1	127	September 22, 2024
Ray Out of Memory Issue Ray Tune	1	201	April 30, 2024
GPU memory not being freed every other trial in Ray Tune	3	718	February 21, 2023
Prioritize paused trials over starting new ones Ray Tune	0	13	November 13, 2024

Adding memory in resources_per_trial in tune.run() hangs

Related topics