Failed to get queue length from Replica

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

We have deployed transformer model using Ray serve and kuberay in EKS g4dn.xlarge instances.
Ray version: 2.31.0, kuberay version: 1.1.1

Sometimes we see high latency in out deployments and while checking the logs in ray we are seeing the below warning messages.
WARNING 2024-08-28 22:19:19,516 proxy 172.10.53.232 35620804-965d-4ff6-b2d7-8b04d9765b85 /extract-wc-qa/extract-fields pow_2_scheduler.py:501 - Failed to get queue length from Replica(id='mywseixp', deployment='ExtractFields', app='extract_fields') within 1.0s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S environment variable.

What can be cause of this warning, and how can we avoid it? We checked within ray head pod the ENV RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S which is mentioned in the log doesn’t exists.

This is talking about it takes too long to retrieve the queue length and that you should check your networking on the cluster, like literally. But if this network delay is expected, you can adjust the timeout by setting that environment variable RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S. Without setting it, the default response deadline is 0.1s. ray/python/ray/serve/_private/constants.py at 5463f5b99f81cbcfc7e2b02fc7143b638d3c5065 · ray-project/ray · GitHub

1 Like