Resources not available with Ray's multiprocessing

faten · March 9, 2021, 6:25pm

Hello, I would like to use ray’s multithreading Pool to run a few distributed jobs. However, I seem to be having an issue with the allocated resources.

I am using ray==0.8.7
My code is a bit complex but I was able to reproduce it in here
I tried to reproduce it with a simple example here, inspired from the documentation:

import time
import ray
from ray import tune
from ray.util.multiprocessing import Pool


def evaluation_fn(step, width, height):
    time.sleep(0.1)
    return (0.1 + width * step / 100)**(-1) + height * 0.1


def easy_objective(config):
    width, height = config["width"], config["height"]

    for step in range(config["steps"]):
        intermediate_score = evaluation_fn(step, width, height)
        tune.report(iterations=step, mean_loss=intermediate_score)


def run_example(num_samples):
    print(f"cluster resources {ray.cluster_resources()}")
    print(f"cluster available resources {ray.available_resources()}")
    _ = tune.run(
        easy_objective,
        num_samples=num_samples,
        config={
            "steps": 5,
            "width": tune.uniform(0, 20),
            "height": tune.uniform(-100, 100),
            "activation": tune.grid_search(["relu", "tanh"])
        })


def main():
    start = time.time()
    pool = Pool()
    for result in pool.map(run_example, [5, 6]):
        print(result)
    end = time.time()
    delta = end - start
    print(f'Took {delta:.3f} seconds')


if __name__ == '__main__':
    main()

It seems to me that it’s able to recognize the 8 cores I have, but they all get busy and the run doesn’t start.

Here is a sample of the output:

(pid=3675) cluster resources  {'object_store_memory': 32.0, 'memory': 93.0, 'node:192.168.0.81': 1.0, 'CPU': 8.0}
(pid=3675) cluster available resources {'node:192.168.0.81': 1.0, 'object_store_memory': 32.0, 'memory': 93.0}
....
2021-03-09 17:58:21,740	WARNING worker.py:1134 -- The actor or task with ID X is pending and cannot currently be scheduled. It requires {CPU: 1.000000} for execution and {CPU: 1.000000} for placement, but this node only has remaining {node:192.168.0.81: 1.000000}, {memory: 4.541016 GiB}, {object_store_memory: 1.562500 GiB}. In total there are 0 pending tasks and 16 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.

It worked fine when I used python multiprocessing Pool but I was wondering if you had an idea of why this happens with Ray?

Thanks!

rliaw · March 9, 2021, 7:08pm

Hmm, it’s unlikely that Ray Tune will work with multi-processing here. Is there a reason why you need to do this?

faten · March 10, 2021, 7:50am

Thanks for the answer Richard.

I am basically interested in launching many tune runs in parallel with completely different configs. Python multiprocessing seems to be doing the job well. But I tried Ray’s before and I was curious to know what I did wrong in the code or if you simply think Tune won’t work with that.

rliaw · March 10, 2021, 8:55am

Hmm, in the newest Tune (Ray 1.2) you should be able to specify a list of pre-defined configurations via the BasicVariantGenerator. Would that work for you?

faten · March 11, 2021, 12:29pm

Hmm interesting indeed. Could be a good alternative for when we upgrade Ray’s version later. In the meantime, I will find an alternative to this.

Thanks!

Topic		Replies	Views
Resources not being used Ray Core	4	1243	September 21, 2021
Not fully used resources by ray tune Ray Tune	2	402	August 11, 2021
Nested `multiprocessing.Pool` on a distributed Ray cluster Ray Core	1	155	April 18, 2024
Ray Out of Memory Issue Ray Tune	1	168	April 30, 2024
Using a subset of available CPUs	2	475	December 12, 2020

Resources not available with Ray's multiprocessing

Related topics