Letting remote function use all CPUs?

I’m experimenting with using Ray to offload compute from within Jupyter notebooks. Some functions will be more efficient if I let them do their own threading (e.g. for RAM or cache reasons) and that will also improve interactive latency. I’m hoping for a way to mark those as remote and have Ray schedule them so they take up all CPUs on whichever host they land on. Is there a way to do this?

I’ve checked to see if num_cpus=-1 get special treatment (no) and considered adding a new pseudo-accelerator (kludge and doesn’t really solve the problem). Am I missing something obvious?

Thanks!

Do you know the number of CPUs are available on each host in your cluster?

Locally yes, and AWS yes (but a different #). I’m hoping to avoid the notebooks needing to know the worker host details (and updating that by hand when I switch environments) though I can see how that would be a workaround.

Hypothetically you could use ray.cluster_resources() to programmatically determine the worker host details?

That won’t work because ray.cluster_resources() gives you the totals but not the per-host details. It turns out an easy kludge is to make a new resource type “node” that is 1.0 per node, a little ugly but gets the right behavior and could be put into a wrapper library.

@ray.remote(resources={'node': 0.001})
def foo(x):
    time.sleep(1+random.random()*0.01)
    return x

@ray.remote(resources={'node': 1})
def bar(x):
    time.sleep(1+random.random()*0.01)
    return x

Above foo() will bottleneck on cores while bar() won’t be scheduled until it has the node all to itself.

Another option is to do:

node_ids = {node_id for node_id in ray.cluster_resources() if node_id.startswith("node:"}
@ray.remote
def func(...):
   pass

[func.options(resources={n: 1}).remote() for n in node_ids]

cc @simon-mo don’t you have a way to see the per-host information?

To check my understanding: This would fire off one invocation pinned to each node?

Yeah, that is right.

Ok thanks. It’s probably not spending more time polishing this but the case where I end up wanting to do this is in compressing video after generating an image sequence. Easiest path is to run ffmpeg as a subprocess, letting it use all available cores is desirable so I can see the output sooner.

1 Like