Ray serve deployment is not scaling up, ongoing request is always 0

heiru · January 18, 2024, 12:44pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi Ray team, I have deployed a ray cluster with kuberay and successfully served a simple synchronous yolov7 model in pytorch using grpc, it process requests without any issue but I can’t get the up-scale part of the autoscaling to work.

The symptom is

I can see the ray_serve_num_ongoing_grpc_requests piling up, meaning the requests are all queued at the proxy
ray_serve_replica_processing_queries is always 0
ray_serve_replica_pending_queries is always 0

My deployment config is as followed:

max_concurrent_queries: 10
user_config: null
autoscaling_config:
  min_replicas: 0
  initial_replicas: 1
  max_replicas: 8
  target_num_ongoing_requests_per_replica: 1
  metrics_interval_s: 2
  look_back_period_s: 4
  smoothing_factor: 0.8
  upscale_smoothing_factor: 0.8
  downscale_smoothing_factor: 0.3
  downscale_delay_s: 600
  upscale_delay_s: 10
graceful_shutdown_wait_loop_s: 2
graceful_shutdown_timeout_s: 20
health_check_period_s: 10
health_check_timeout_s: 30

and the replica even scale down to 0 while many ongoing requests is happening, because the ray_serve_replica_processing_queries is 0.

I am not sure if its because I’m serving the endpoint with grpc, any help is welcome as I’ve been scratching my head for quite some time.
Thanks!

psydok · April 18, 2024, 7:04pm

Hi, can you please tell me if you managed to solve the problem? Do you have the yolo model converted to onnx too?

Topic		Replies	Views
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	904	October 20, 2023
Autoscaling with `max_concurrent_queries = 1` Ray Serve	2	810	May 13, 2022
Autoscaling Replicas in Ray Serve Ray Serve	5	1718	March 12, 2021
How to ensure ray serve using max replicas possible Ray Serve	3	656	October 19, 2023
Ray serve autoscaling queue size Ray Serve	5	1394	May 24, 2022

Ray serve deployment is not scaling up, ongoing request is always 0

Related topics