How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
We have deployed transformer model using Ray serve and kuberay in EKS g4dn.xlarge instances.
Ray version: 2.31.0, kuberay version: 1.1.1
Sometimes we see high latency in out deployments and while checking the logs in ray we are seeing the below warning messages.
WARNING 2024-08-28 22:19:19,516 proxy 172.10.53.232 35620804-965d-4ff6-b2d7-8b04d9765b85 /extract-wc-qa/extract-fields pow_2_scheduler.py:501 - Failed to get queue length from Replica(id='mywseixp', deployment='ExtractFields', app='extract_fields') within 1.0s. If this happens repeatedly it's likely caused by high network latency in the cluster. You can configure the deadline using the RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S environment variable.
What can be cause of this warning, and how can we avoid it? We checked within ray head pod the ENV RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S
which is mentioned in the log doesn’t exists.