How does Ray load-balance Actors across Ray Workers?

Seanny123 · November 28, 2021, 10:20pm

As I describe in another question, I’m sending work to hundreds of Ray Actors using a Queue. These Ray Actors are distributed across several Ray Worker nodes. How does Ray decide what actors to send to which nodes?

I ask, because currently two of my nodes are being given almost all the Actors (according to the Ray dashboard) and being used at max capacity, while my third is sitting idle. This can be seen in the Graphana plot below, where each colour demonstrates the CPU usage of each of my Workers. The Yellow and Green workers are being used extensively, while Blue mostly sits idle.

If Ray naively balanced the load by equally allocating the same number of Actors across all nodes, I wouldn’t be seeing such unequal usage. Given my problem, I’m assuming Ray is using some other method?

What method is it using for distributing Actors to Workers and can I change it?

parallelAllTheThings · November 30, 2021, 2:30pm

Ray and others systems usually try to schedule tasks locally, to avoid the cost of moving data. Said that you can specify resources for your actors (number of cpus/gpus, memory or a custom resource) or use the placement group API. Using these resources would spread more evenly the actors across your cluster.

Topic		Replies	Views
How can I do load balancing in cluster? Ray Clusters	2	1102	July 23, 2022
Question about Actors Ray Core	1	305	December 20, 2022
Divide Work between Actors Ray Core	5	336	January 22, 2021
Weird Interaction between Actor Pool and node-specific actors handles Ray Core	1	304	August 19, 2023
Running Ray actors on worker nodes directly on Ray Ray Core	3	719	March 29, 2022

How does Ray load-balance Actors across Ray Workers?

Related topics