How to do Load Balancing?

shrekris · September 3, 2024, 5:31am

it looks like that currently all traffic is routed to the first replica and the second is idle all the time although the first replica could use some relief as it is under heavy fire and at >90% usage.

Proxy actors forward requests to deployment replicas using a ServeHandle. Usually that means replicas get roughly even load because the ServeHandle performs power-of-two-choices.

As an optimization, the ServeHandles in the proxy actor first perform power-of-two-choices across replicas only on the same node. That way, requests can be fulfilled without requiring cross-node communication. If no replicas are on the same node (or if all replicas on the same node are busy), then the proxy falls back to replicas on other nodes.

Since you’re running with a proxy only on one node, this is likely why only one of your replicas is getting traffic. It’s not being saturated (i.e. the number of ongoing requests is always lower than max_ongoing_requests), so the proxy keeps sending that replica traffic.

Could you enable proxy actors on all nodes, and balance requests across them? That way, the traffic is spread out more evenly.

must I implement load balancing on my own, e.g., using nginx, such that also the second replica is utilized?

There are two places where load balancing should happen:

Across proxy actors: this must be implemented outside of Serve. For example, you could use nginx here to balance requests across all the different proxy actors.
Across deployment replicas: this happens out-of-the-box in the ServeHandle. When a ServeHandle receives a request, it selects a replica using power-of-two-choices and sends the request to that replica using a Ray actor call.

Topic		Replies	Views
How can I do load balancing in cluster? Ray Clusters	2	1102	July 23, 2022
How does Ray load-balance Actors across Ray Workers? Ray Clusters	1	818	November 30, 2021
Autoscaling Replicas in Ray Serve Ray Serve	5	1693	March 12, 2021
Making Ray scheduler to Pack the workloads Ray Core	0	111	April 5, 2024
Ray Serve not distributing load to all replicas equally Ray Serve	2	37	June 8, 2025

How to do Load Balancing?

Related topics