I am using Pool in a Ray cluster. I want to be able to scale the number of jobs sent to different nodes proportionately to the compute capability (e.g., the number of CPUs) that each node has. Unfortunately, the Ray cluster pool I set up is sending the same number of jobs to different nodes even though the nodes have different sizes/different numbers of CPUs.
For example, I have four nodes in the cluster
Node 1: 64 CPUs [Head]
Node 2: 64 CPUs
Node 3: 48 CPUs
Node 4: 16 CPUs
While initializing the clusters/adding each of the nodes, I specified to Ray the number of CPUs that each of the nodes has. However, if I try to run highly parallelizable/completely independent processes, say 192 processes, the Ray sends 192/4 = 48 to each of the nodes. This is not desirable because the processes in Node 4 finish last (taking about 3 times the duration the other nodes take), while the processes in Nodes 1 to 3 finish almost at the same time (and are three times faster than Node 1). So, the entire job is slowed down by Node 1.
What I intend to achieve is (64:64:48:16
) as follows.
Node 1: 64 processes
Node 2: 64 processes
Node 3: 48 processes
Node 4: 16 processes
What Ray is currently doing is (48:48:48:48
) as follows.
Node 1: 48 processes
Node 2: 48 processes
Node 3: 48 processes
Node 4: 48 processes
I will appreciate any suggestions that can help me achieve something close to 64:64:48:16
(rather than 48:48:48:48
).
The following is a sample code I am using to test/debug the cluster.
import ray
from ray.util.multiprocessing import Pool
ray.init(address='auto')
pool_1 = Pool()
# A slow enough CPU-intensive function just to test the idea
def fibonacci(n):
if n<= 0:
"Invalid"
elif n == 1:
return 0
elif n == 2:
return 1
else:
return fibonacci(n-1) + fibonacci(n-2)
results = pool_1.map(fibonacci, [38]*192)