IDLE ray worker nodes in map_batches

Akshay_Kenchappa_Man · May 24, 2023, 9:22pm

I am getting the following warning from AUTOSCALER when running a ray job with 8 GPU’s with map_batches
Warning: The following resource request cannot be scheduled right now: {‘GPU’: 1.0, ‘CPU’: 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
This is the resource status:
Usage:
8.0/64.0 CPU
8.0/8.0 GPU
As you can see, only 8 CPU’s are being used.In the ray dashboard, I see under clusters that there are 64 worker nodes in the head node but only 8 of them are being used. The rest have status as IDLE.
How can I have all the worker nodes be used and why aren’t they idle? Is this limited to number of GPU’s? I tried providing 32 GPU’s but that just creates 3 head nodes with the same issue of only 8 worker nodes being used out of the 64 in each of the head node

Jules_Damji · May 24, 2023, 9:50pm

@Akshay_Kenchappa_Man Can you provide your code where you using map_matches and specifying the resources in the Ray task.

cc: @jjyao @cade

jjyao · May 24, 2023, 10:54pm

Hi @Akshay_Kenchappa_Man,

there are 64 worker nodes in the head node

Do you mean worker processes?

creates 3 head nodes

A ray cluster can only have one head node.

Topic		Replies	Views
Worker nodes are IDLE	1	563	January 26, 2022
K8s nodes running empty Kubernetes	3	514	March 5, 2021
Cluster usage is not 100% rather 57% Ray Clusters	0	418	October 21, 2021
About CPU Usage in multi nodes Ray Core	2	342	February 14, 2023
Ray indicates that the request resource is insufficient Ray Clusters	0	657	December 19, 2022

IDLE ray worker nodes in map_batches

Related topics