I have some questions
-
How to set max concurrency for deployment. What happend when ray serve received for example 300 requests same time requestes are pending in a queue?
-
When I send 600 requests (each request has 150KB payload) in same time in dashboard I see that httpproxyactor increase cpu utilization to 100% and all requests stuck, there is no response. How to increase httpproxyactor process on each node?
-
Maybe I have to set max concurrency in service mesh or fastapi ingress endpoint?
-
I see in ray dashboard that some actors increase memory ram. So I have to write function memory monitor for garbage cleaner or kill process ?
-
In Pattern: Http endpoint for dag graph — Ray 3.0.0.dev0 i see version 3.0.0-dev 0 but in github I see release 1.12.1 i am confussed. So master branch is version 3.0.0 beta?