Letting remote function use all CPUs?

choong · March 7, 2021, 12:28am

I’m experimenting with using Ray to offload compute from within Jupyter notebooks. Some functions will be more efficient if I let them do their own threading (e.g. for RAM or cache reasons) and that will also improve interactive latency. I’m hoping for a way to mark those as remote and have Ray schedule them so they take up all CPUs on whichever host they land on. Is there a way to do this?

I’ve checked to see if num_cpus=-1 get special treatment (no) and considered adding a new pseudo-accelerator (kludge and doesn’t really solve the problem). Am I missing something obvious?

Thanks!

rliaw · March 8, 2021, 12:20am

Do you know the number of CPUs are available on each host in your cluster?

choong · March 8, 2021, 12:36am

Locally yes, and AWS yes (but a different #). I’m hoping to avoid the notebooks needing to know the worker host details (and updating that by hand when I switch environments) though I can see how that would be a workaround.

rliaw · March 8, 2021, 9:15pm

Hypothetically you could use ray.cluster_resources() to programmatically determine the worker host details?

choong · March 8, 2021, 9:58pm

That won’t work because ray.cluster_resources() gives you the totals but not the per-host details. It turns out an easy kludge is to make a new resource type “node” that is 1.0 per node, a little ugly but gets the right behavior and could be put into a wrapper library.

@ray.remote(resources={'node': 0.001})
def foo(x):
    time.sleep(1+random.random()*0.01)
    return x

@ray.remote(resources={'node': 1})
def bar(x):
    time.sleep(1+random.random()*0.01)
    return x

Above foo() will bottleneck on cores while bar() won’t be scheduled until it has the node all to itself.

rliaw · March 9, 2021, 8:22am

Another option is to do:

node_ids = {node_id for node_id in ray.cluster_resources() if node_id.startswith("node:"}
@ray.remote
def func(...):
   pass

[func.options(resources={n: 1}).remote() for n in node_ids]

sangcho · March 9, 2021, 6:32pm

cc @simon-mo don’t you have a way to see the per-host information?

choong · March 9, 2021, 11:56pm

To check my understanding: This would fire off one invocation pinned to each node?

rliaw · March 9, 2021, 11:59pm

Yeah, that is right.

choong · March 10, 2021, 12:24am

Ok thanks. It’s probably not spending more time polishing this but the case where I end up wanting to do this is in compressing video after generating an image sequence. Easiest path is to run ffmpeg as a subprocess, letting it use all available cores is desirable so I can see the output sooner.

Topic		Replies	Views
Ray Questions( dynamic remotify + num_cpus for remote func that calls remote funcs) Ray Core	2	316	December 27, 2020
Ray on single machine. No threading? Ray Core	10	2029	April 2, 2021
Running a list of functions with limited parallelism and autoscaling Ray Core	2	350	February 8, 2022
Resource limits in ray.remote functions Ray Core	8	589	February 9, 2021
Using a subset of available CPUs	2	467	December 12, 2020

Letting remote function use all CPUs?

Related topics