Adding memory in resources_per_trial in tune.run() hangs

Hi, I am using tune.run() to do hyperparameter tuning.

I noticed that, when I pass
resources_per_trial =
{“cpu” : 4, “gpu”: 1, } → this will work.

However, when I added memory, it hangs
resources_per_trial =
{“cpu” : 4, “gpu”: 1, “memory”: 1024*1024} memory’s unit is in bytes, I believe.

I have 16gb memory allocated for ray cluster so it should be enough. Any one know why it hangs

We are using ray 1.12.

Hey @raytune_kuberay_user- a very simple workload is able to run successfully for me:

import ray
from ray import tune

def train_fn(args):
    return 1


ray.init()
print(ray.available_resources())
tune.run(train_fn, config={}, resources_per_trial={"cpu": 1, "memory": 1024*1024},
         num_samples=4)

my hunch is that this is not something on the Ray Tune side, but rather on the cluster/kuberay deployment.

Can you do a print(ray.available_resources()) and share the output? You should see memory included in the output, so something like this: {'node:127.0.0.1': 1.0, 'CPU': 16.0, 'memory': 3314275124.0, 'object_store_memory': 1657137561.0}

1 Like

Thanks. We later realized that this is a bug that is fixed in 1.13.0 [tune] Fix memory resources for head bundle #23861