I am using Pool in a Ray cluster. I want to be able to scale the number of jobs sent to different nodes proportionately to the compute capability (e.g., the number of CPUs) that each node has. Unfortunately, the Ray cluster pool I set up is sending the same number of jobs to different nodes even though the nodes have different sizes/different numbers of CPUs.
For example, I have four nodes in the cluster
Node 1: 64 CPUs [Head] Node 2: 64 CPUs Node 3: 48 CPUs Node 4: 16 CPUs
While initializing the clusters/adding each of the nodes, I specified to Ray the number of CPUs that each of the nodes has. However, if I try to run highly parallelizable/completely independent processes, say 192 processes, the Ray sends 192/4 = 48 to each of the nodes. This is not desirable because the processes in Node 4 finish last (taking about 3 times the duration the other nodes take), while the processes in Nodes 1 to 3 finish almost at the same time (and are three times faster than Node 1). So, the entire job is slowed down by Node 1.
What I intend to achieve is (
64:64:48:16) as follows.
Node 1: 64 processes Node 2: 64 processes Node 3: 48 processes Node 4: 16 processes
What Ray is currently doing is (
48:48:48:48) as follows.
Node 1: 48 processes Node 2: 48 processes Node 3: 48 processes Node 4: 48 processes
I will appreciate any suggestions that can help me achieve something close to
64:64:48:16 (rather than
The following is a sample code I am using to test/debug the cluster.
import ray from ray.util.multiprocessing import Pool ray.init(address='auto') pool_1 = Pool() # A slow enough CPU-intensive function just to test the idea def fibonacci(n): if n<= 0: "Invalid" elif n == 1: return 0 elif n == 2: return 1 else: return fibonacci(n-1) + fibonacci(n-2) results = pool_1.map(fibonacci, *192)