How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi Ray team, I have deployed a ray cluster with kuberay and successfully served a simple synchronous yolov7
model in pytorch using grpc
, it process requests without any issue but I can’t get the up-scale part of the autoscaling to work.
The symptom is
- I can see the
ray_serve_num_ongoing_grpc_requests
piling up, meaning the requests are all queued at the proxy ray_serve_replica_processing_queries
is always 0ray_serve_replica_pending_queries
is always 0
My deployment config is as followed:
max_concurrent_queries: 10
user_config: null
autoscaling_config:
min_replicas: 0
initial_replicas: 1
max_replicas: 8
target_num_ongoing_requests_per_replica: 1
metrics_interval_s: 2
look_back_period_s: 4
smoothing_factor: 0.8
upscale_smoothing_factor: 0.8
downscale_smoothing_factor: 0.3
downscale_delay_s: 600
upscale_delay_s: 10
graceful_shutdown_wait_loop_s: 2
graceful_shutdown_timeout_s: 20
health_check_period_s: 10
health_check_timeout_s: 30
and the replica even scale down to 0 while many ongoing requests is happening, because the ray_serve_replica_processing_queries
is 0.
I am not sure if its because I’m serving the endpoint with grpc, any help is welcome as I’ve been scratching my head for quite some time.
Thanks!