I am getting the following warning from AUTOSCALER when running a ray job with 8 GPU’s with map_batches
Warning: The following resource request cannot be scheduled right now: {‘GPU’: 1.0, ‘CPU’: 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
This is the resource status:
Usage:
8.0/64.0 CPU
8.0/8.0 GPU
As you can see, only 8 CPU’s are being used.In the ray dashboard, I see under clusters that there are 64 worker nodes in the head node but only 8 of them are being used. The rest have status as IDLE.
How can I have all the worker nodes be used and why aren’t they idle? Is this limited to number of GPU’s? I tried providing 32 GPU’s but that just creates 3 head nodes with the same issue of only 8 worker nodes being used out of the 64 in each of the head node
@Akshay_Kenchappa_Man Can you provide your code where you using map_matches and specifying the resources in the Ray task.
there are 64 worker nodes in the head node
Do you mean worker processes?
creates 3 head nodes
A ray cluster can only have one head node.