Failed to get queue length from Replica

Ritesh_K · August 29, 2024, 6:40am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

We have deployed transformer model using Ray serve and kuberay in EKS g4dn.xlarge instances.
Ray version: 2.31.0, kuberay version: 1.1.1

Sometimes we see high latency in out deployments and while checking the logs in ray we are seeing the below warning messages.
WARNING 2024-08-28 22:19:19,516 proxy 172.10.53.232 35620804-965d-4ff6-b2d7-8b04d9765b85 /extract-wc-qa/extract-fields pow_2_scheduler.py:501 - Failed to get queue length from Replica(id='mywseixp', deployment='ExtractFields', app='extract_fields') within 1.0s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S environment variable.

What can be cause of this warning, and how can we avoid it? We checked within ray head pod the ENV RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S which is mentioned in the log doesn’t exists.

Gene · September 4, 2024, 9:49pm

This is talking about it takes too long to retrieve the queue length and that you should check your networking on the cluster, like literally. But if this network delay is expected, you can adjust the timeout by setting that environment variable RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S. Without setting it, the default response deadline is 0.1s. ray/python/ray/serve/_private/constants.py at 5463f5b99f81cbcfc7e2b02fc7143b638d3c5065 · ray-project/ray · GitHub

Topic		Replies	Views
How to check the lengh of queue for each replica of deployment Ray Serve	7	874	February 19, 2025
Ray Serve http queued call hangs if workers are busy Ray Serve	5	65	April 17, 2025
Ray Serve with Fast API and Serve batch- Client Request cancellation RLlib	0	68	January 3, 2025
Error Scaling Ray Serve to 2 Replicas Ray Serve	11	1454	August 11, 2021
No request can complete until all requests are ready Ray Serve	14	692	November 14, 2023

Failed to get queue length from Replica

Related topics