Scaling Ray Serve efficiently

Tarun_Dugar1 · December 10, 2024, 10:48am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hello,
I am a novice and need help with the following topics. I am currently using ray serve to create a deployment on kubernetes using kuberay operator that needs to handle high throughput (upto 500k rpm, can go higher) at low latency (upto 15 ms). I have a lightweight model that takes around 3 ms for inference.

How to scale up worker nodes automatically when 1 node isn’t able to handle the load?
Even if i manually scale up the worker node, if I don’t have a pod scheduled on it - it still give the error - readiness probe failed: success. How can I create a worker node that can be on standby and not keep failing?
I cannot have gpu’s. What kind of config in terms of number of worker pods, autoscaling config, number of replicas would you recommend from your experience for such workloads?

Sorry if the questions are too basic. Really appreciate any help.

Topic		Replies	Views
Ray Serve replica level autoscaling not working with Kube deployment Ray Serve	3	29	June 11, 2025
Autoscaling RayServe Pods in k8s keeps terminating and restarting pods Ray Serve	4	729	November 20, 2023
Serve autoscaling in EKS Ray Serve	7	810	June 3, 2024
RayServe Autoscaling not creating Ray Pods Ray Serve	3	290	March 29, 2024
Error Scaling Ray Serve to 2 Replicas Ray Serve	11	1457	August 11, 2021

Scaling Ray Serve efficiently

Related topics