I have some questions
How to set max concurrency for deployment. What happend when ray serve received for example 300 requests same time requestes are pending in a queue?
When I send 600 requests (each request has 150KB payload) in same time in dashboard I see that httpproxyactor increase cpu utilization to 100% and all requests stuck, there is no response. How to increase httpproxyactor process on each node?
Maybe I have to set max concurrency in service mesh or fastapi ingress endpoint?
I see in ray dashboard that some actors increase memory ram. So I have to write function memory monitor for garbage cleaner or kill process ?
In Pattern: Http endpoint for dag graph — Ray 3.0.0.dev0 i see version 3.0.0-dev 0 but in github I see release 1.12.1 i am confussed. So master branch is version 3.0.0 beta?
max_concurrent_queries is the maximum number of queries that each deployment replica can handle. However, this number is counted from each
RayServeHandle's point of view. This means each handle checks only the number of queries that it itself is processing and compares it to the max concurrency, so it’s not a hard limit.
Here’s a guide on how to scale your HTTP server. Additionally, HTTPProxies use uvicorn as their server. You could also try replacing this with a higher throughput server. Here’s an example guide on how to replace uvicorn with other servers.
- Yes, you could try setting the max concurrency on your ingress endpoint to reduce the number of queries that your HTTPProxy have to handle. However, I’m not sure if this will necessarily solve the issue. I believe the proxy will still need to read the incoming requests, so that may still be a bottleneck.
- You shouldn’t need to write a custom garbage collector. Ray uses ref counting to decide whether to keep objects in memory. Once an object has no more references, I believe Ray itself should garbage collect the object. Ray also has object spilling, which will spill data from memory to disk if needed. You can use the
ray memory command to debug your Ray application’s memory usage.
- That’s right. The master branch’s version is currently 3.0.0, but the latest release is 1.12.1. Periodically, Ray cuts releases from the master branch and gives them a release version (e.g. 1.11, 1.12, etc.).