Why there is no possibility to call more than 100 requests in parallel to Ray Serve?

oleg_dats · December 30, 2023, 1:56pm

I host very simple service:

@serve.deployment(num_replicas=1, max_concurrent_queries=500)
class MyModelDeployment:
    def __init__(self, msg: str):
        self._msg = msg

    def __call__(self, request: Request) -> Dict:
        return {"result": self._msg}

app = MyModelDeployment.bind(msg="Hello world!")

serve.run(app, route_prefix="/")

print(requests.get("http://localhost:8000/").json()) # it works

I use to hey to test performance:
hey -n 200 -c 100 http://localhost:8000/ : works
hey -n 200 -c 200 http://localhost:8000/: [1] Get “http://localhost:8000/”: read tcp 127.0.0.1:51044->127.0.0.1:8000: read: connection reset by peer

shrekris · January 9, 2024, 12:02am

This is unexpected– the replica and proxy should be able to support that load. Do the proxy or replica logs show any failures? I wonder if this might be a limitation of trying to open and sustain 200 connections on a single machine.

oleg_dats · January 9, 2024, 9:08am

I do not see any errors. It process N requests and then breaks.

shrekris · January 9, 2024, 5:43pm

If there’s no errors, then this may be a limitation on the machine’s ability to open and sustain 200 connections to itself. Could you try the following experiment:

Run 2 client machines that run hey and 1 server machine that runs Serve.
Run hey with 100 clients on each client machine and send requests to the server machine.

If that succeeds, then the root cause is likely the single machine.

oleg_dats · January 10, 2024, 8:20am

Can I emulate this experiment ( 2 client machines and 1 server) on a single machine (by using 1 core as a machine)?

Topic		Replies	Views
How to ensure ray serve using max replicas possible Ray Serve	3	608	October 19, 2023
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	851	October 20, 2023
Why Ray Serve only just use half numbers of replicas for parallelism Ray Serve	4	659	February 10, 2023
Can we customize the behavior of Ray Serve when max_concurrent_requests is reached? Ray Serve	1	317	December 29, 2023
Ray serve autoscaling queue size Ray Serve	5	1342	May 24, 2022

Why there is no possibility to call more than 100 requests in parallel to Ray Serve?

Related topics