QPS drop with multiple locust users

drZoid · April 24, 2025, 12:49am

1. Severity of the issue: (select one)
High: Completely blocks me.

2. Environment:

Ray version: 2.43.0
Python version: 3.12.7
OS: linux / osx
Cloud/Infrastructure: aws
Other libs/tools (if relevant): locust

3. What happened vs. what you expected:

Expected: QPS to not change with multiple concurrent users if max_ongoing_requests is set to 1
Actual: QPS drops with multiple concurrent users

When I run this deployment and use 1 locust user to send it requests, I get about 9-10 RPS as expected.

When I run locust with 5 concurrent users, I expect the same behavior as max_ongoing_requests is set to 1 for this deployment. However, the RPS drops to 6. This issue can be observed in our production code as well with a different type of workload and is quite inexplicable. Can you please help explain / resolve this?

Deployment code -

import time
from fastapi import FastAPI
from ray import serve

app = FastAPI()

@serve.deployment(max_ongoing_requests=1)
@serve.ingress(app)
class TestDeployment:
    @app.post("/invoke", name="invoke")
    def invoke(self):
        time.sleep(0.1)
        return "Hello, world!"


deployment = TestDeployment.bind()

locust file -

from locust import HttpUser, task

class LocustUser(HttpUser):

    def __init__(self, *args):
        super().__init__(*args)

    @task
    def invoke(self):
        self.client.post("/invoke")

Topic		Replies	Views
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	828	October 20, 2023
[Low] How to increase the number of workers for a controller or proxy service? Ray Serve	2	294	December 5, 2023
When I make multiple concurrent requests, the program reports an error, but there is no problem with a single program	1	32	February 19, 2025
Can we customize the behavior of Ray Serve when max_concurrent_requests is reached? Ray Serve	1	312	December 29, 2023
Ray Serve - Client request Cancellation Ray Serve	2	131	March 27, 2025

QPS drop with multiple locust users

Related topics