I’m working with ray on a SLURM-managed cluster and am puzzled how ray distributes work among its workers. In a test case I’m executing a benchmark function 8 times on a cluster of 4 nodes (1 head node, 3 workers). From the output it seems like the function is always executed on the same node.
All workers and head nodes are identical, num_cpus=32 and there are no special requirements in the benchmark function decorator. What am I missing?
I’m also observing that the same benchmark code runs about 3x slower on the ray cluster than locally. Is this because the same worker is executing multiple instances of run_benchmark(A) simultaneously?
Source code and output is here: gist:4d1ff3d8ead4a46cf51cb750759e8a21 · GitHub