How does ray decide where to run a function?

Hi,
I’m working with ray on a SLURM-managed cluster and am puzzled how ray distributes work among its workers. In a test case I’m executing a benchmark function 8 times on a cluster of 4 nodes (1 head node, 3 workers). From the output it seems like the function is always executed on the same node.
All workers and head nodes are identical, num_cpus=32 and there are no special requirements in the benchmark function decorator. What am I missing?

I’m also observing that the same benchmark code runs about 3x slower on the ray cluster than locally. Is this because the same worker is executing multiple instances of run_benchmark(A) simultaneously?

Source code and output is here: gist:4d1ff3d8ead4a46cf51cb750759e8a21 · GitHub

I think I solved the problem. The missing part is to modify the decorator to
@ray.remote(num_cpus=32).

How exactly is this information used internally by ray? Does it tag the function to take up 32 CPU cores of each worker and inform how other functions will be distributed across actors? That is, if a worker with 32 CPU cores is already executing a function decorated with @ray.remote(num_cpus=32) it won’t be executing another one of this functions at the same time?