How does routing work when using 'http-location=all'

Kyle_Robinson · July 28, 2023, 3:45am

We are deploying an inference cluster using Serve on AWS, and are starting to distribute requests among all workers’ own http_proxy servers. Requests are routed among all workers/head using our own external AWS load balancer. This load balancer is rather ‘stupid’ and has no context of the number of ongoing requests in each worker node. When a request arrives at a worker nodes http_proxy, is it further routed to another worker? How do all of these http_servers know which workers to route to?

Gene · July 28, 2023, 3:57am

Short answer is the http_proxy on the worker which LB routed to will randomly pick a replica to route the request. It can be on any worker/ head as long as they still accept requests.

http_location tells Ray where to start http_proxy. If it’s set to all, then all nodes regardless of head or worker can accept requests on their http port. Once a request is hitting a node, the http_proxy has the state of where is which replicas running. By default, there is this power of two choices algorithm, that basically just take two replicas and send the traffic to the one with less request queue length. You can read more https://github.com/ray-project/ray/blob/master/python/ray/serve/_private/router.py#L263

Topic		Replies	Views
How to do Load Balancing? Ray Clusters	4	153	September 10, 2024
Can Ray Serve handle https? [2023] Ray Serve	9	487	September 4, 2024
Ray status does not see worker node Ray Clusters	6	1608	July 15, 2024
Resources used by HTTPProxyActor Ray Serve	5	1084	February 16, 2021
Ray Serve HTTP requests handling Ray Libraries (Data, Train, Tune, Serve)	6	890	April 6, 2023

How does routing work when using 'http-location=all'

Related topics