How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hello,
I am a novice and need help with the following topics. I am currently using ray serve to create a deployment on kubernetes using kuberay operator that needs to handle high throughput (upto 500k rpm, can go higher) at low latency (upto 15 ms). I have a lightweight model that takes around 3 ms for inference.
-
How to scale up worker nodes automatically when 1 node isn’t able to handle the load?
-
Even if i manually scale up the worker node, if I don’t have a pod scheduled on it - it still give the error - readiness probe failed: success. How can I create a worker node that can be on standby and not keep failing?
-
I cannot have gpu’s. What kind of config in terms of number of worker pods, autoscaling config, number of replicas would you recommend from your experience for such workloads?
Sorry if the questions are too basic. Really appreciate any help.