Max concurency for deployment

Pitos · June 5, 2022, 9:38am

I have some questions

How to set max concurrency for deployment. What happend when ray serve received for example 300 requests same time requestes are pending in a queue?
When I send 600 requests (each request has 150KB payload) in same time in dashboard I see that httpproxyactor increase cpu utilization to 100% and all requests stuck, there is no response. How to increase httpproxyactor process on each node?
Maybe I have to set max concurrency in service mesh or fastapi ingress endpoint?
I see in ray dashboard that some actors increase memory ram. So I have to write function memory monitor for garbage cleaner or kill process ?
In Pattern: Http endpoint for dag graph — Ray 3.0.0.dev0 i see version 3.0.0-dev 0 but in github I see release 1.12.1 i am confussed. So master branch is version 3.0.0 beta?

shrekris · June 6, 2022, 5:46pm

max_concurrent_queries is the maximum number of queries that each deployment replica can handle. However, this number is counted from each RayServeHandle's point of view. This means each handle checks only the number of queries that it itself is processing and compares it to the max concurrency, so it’s not a hard limit.
Here’s a guide on how to scale your HTTP server. Additionally, HTTPProxies use uvicorn as their server. You could also try replacing this with a higher throughput server. Here’s an example guide on how to replace uvicorn with other servers.
Yes, you could try setting the max concurrency on your ingress endpoint to reduce the number of queries that your HTTPProxy have to handle. However, I’m not sure if this will necessarily solve the issue. I believe the proxy will still need to read the incoming requests, so that may still be a bottleneck.
You shouldn’t need to write a custom garbage collector. Ray uses ref counting to decide whether to keep objects in memory. Once an object has no more references, I believe Ray itself should garbage collect the object. Ray also has object spilling, which will spill data from memory to disk if needed. You can use the ray memory command to debug your Ray application’s memory usage.
That’s right. The master branch’s version is currently 3.0.0, but the latest release is 1.12.1. Periodically, Ray cuts releases from the master branch and gives them a release version (e.g. 1.11, 1.12, etc.).

Topic		Replies	Views
How to ensure ray serve using max replicas possible Ray Serve	3	608	October 19, 2023
Can we customize the behavior of Ray Serve when max_concurrent_requests is reached? Ray Serve	1	317	December 29, 2023
Understanding performance of Ray serve Ray Serve	4	667	January 10, 2024
Ray serve autoscaling queue size Ray Serve	5	1342	May 24, 2022
Why there is no possibility to call more than 100 requests in parallel to Ray Serve? Ray Serve	4	259	January 10, 2024