Understanding performance of Ray serve

oleg_dats · December 30, 2023, 1:11pm

I have a very simple identity method hosted by ray serve:

@serve.deployment(num_replicas=1, ray_actor_options={"num_cpus": 1, "num_gpus": 0})
class IdentityService:
    def __init__(self):
        pass

    @serve.batch(max_batch_size=64, batch_wait_timeout_s=0.01)
    async def handle_batch(self, inputs): 
        print("Our input array has length:", len(inputs))
        return inputs

    async def __call__(self, request):
        return await self.handle_batch(request)
    
generator = IdentityService.bind()
handle = serve.run(generator)

Emulate as many requests as possible:

import asyncio

async def send_request():
    return await handle.remote(torch.randint(low=0, high=3, size=(INPUT_SIZE,)).float())

async def main():
    tasks = []
    for _ in range(10000):
        task = asyncio.create_task(send_request())
        tasks.append(task)

    return await asyncio.gather(*tasks)
    
await main()

No matter the different configuration options: num_replicas, num_cpus, max_concurrent_queries. I always get the same performance: process 10k requests in ~35sec and batch size between 5-8.

Can you please explain how to increase performance?

shrekris · January 8, 2024, 11:39pm

You’re likely running into the proxy actor as a bottleneck. We run one proxy actor per node, so I’d recommend increasing the number of nodes and seeing if that alleviates the bottleneck.

oleg_dats · January 9, 2024, 9:00am

Can you please explain what you mean by node and how to increase it?
I run this code on my local machine with 16 CPUs. Changing num_cpus does not change performance.

shrekris · January 9, 2024, 5:40pm

Can you please explain what you mean by node and how to increase it? I run this code on my local machine with 16 CPUs

I mean the number of machines. Each machine runs only one proxy, so that proxy is likely the bottleneck. If you try this workload with more machines– but the same number of replicas– and the max QPS goes up, then the proxy actor is likely the bottleneck.

oleg_dats · January 10, 2024, 8:15am

How to debug this proxy? How to make sure that the issue is with the proxy? My CPU load is low. The number of requests is pretty low. How to fix this bottleneck on the local machine?

Topic		Replies	Views
Low througput and not able to scale with ray serve Ray Serve	1	35	May 6, 2025
Resources used by HTTPProxyActor Ray Serve	5	1141	February 16, 2021
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	851	October 20, 2023
Max concurency for deployment Ray Serve	1	1363	June 6, 2022
[Low] How to increase the number of workers for a controller or proxy service? Ray Serve	2	296	December 5, 2023

Understanding performance of Ray serve

Related topics