Ray Serve with Fast API and Serve batch- Client Request cancellation

High: It blocks me to complete my task.

I deployed Ray Serve deployment using Fast API and batch serve in the Ray cluster. I am invoking the deployed Ray serve endpoint from my local using client script below

import ray
import requests

#ray.init(“ray://127.0.0.1:10001”)

@ray.remote
def send_query(value):
#print(f"value: {value}")
resp = requests.post(“http://127.0.0.1:8000/test/serving_request/”,json=value)
return resp.text

results = [send_query.remote(text) for text in names]
print(“Result returned:”, results)

for result in results:
print(result)
print(ray.get(result))

  1. If i run first time after ray serve deployment, i am able to run multiple request upto 8 request at a time.
  2. But after completing 8 request run , i am not able to even invoke the endpoint.It is giving below error at ray.get .

During handling of the above exception, another exception occurred:

ray::send_query() (pid=30499, ip=10.200.3.133)
File “/home/ray/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py”, line 715, in urlopen
httplib_response = self._make_request(
File “/home/ray/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py”, line 416, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/home/ray/anaconda3/lib/python3.9/site-packages/urllib3/connection.py”, line 244, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File “/home/ray/anaconda3/lib/python3.9/http/client.py”, line 1285, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/home/ray/anaconda3/lib/python3.9/http/client.py”, line 1331, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/home/ray/anaconda3/lib/python3.9/http/client.py”, line 1280, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/home/ray/anaconda3/lib/python3.9/http/client.py”, line 1040, in _send_output
self.send(msg)
File “/home/ray/anaconda3/lib/python3.9/http/client.py”, line 980, in send
self.connect()
File “/home/ray/anaconda3/lib/python3.9/site-packages/urllib3/connection.py”, line 205, in connect
conn = self._new_conn()
File “/home/ray/anaconda3/lib/python3.9/site-packages/urllib3/connection.py”, line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f52e007ec10>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

ray::send_query() (pid=30499, ip=10.200.3.133)
File “/home/ray/anaconda3/lib/python3.9/site-packages/requests/adapters.py”, line 486, in send
resp = conn.urlopen(
File “/home/ray/anaconda3/lib/python3.9/site-packages/urllib3/connectionpool.py”, line 801, in urlopen
retries = retries.increment(
File “/home/ray/anaconda3/lib/python3.9/site-packages/urllib3/util/retry.py”, line 594, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=‘127.0.0.1’, port=8000): Max retries exceeded with url: /test/serving_request/ (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7f52e007ec10>: Failed to establish a new connection: [Errno 111] Connection refused’))

During handling of the above exception, another exception occurred:

ray::send_query() (pid=30499, ip=10.200.3.133)
File “/tmp/ray/session_2025-01-01_19-24-17_406163_1/runtime_resources/working_dir_files/_ray_pkg_3182eb4dff41d75f/ray_serve_request.py”, line 56, in send_query
File “/home/ray/anaconda3/lib/python3.9/site-packages/requests/api.py”, line 115, in post
return request(“post”, url, data=data, json=json, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/requests/api.py”, line 59, in request
return session.request(method=method, url=url, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/requests/sessions.py”, line 589, in request
resp = self.send(prep, **send_kwargs)
r = adapter.send(request, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/requests/adapters.py”, line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=‘127.0.0.1’, port=8000): Max retries exceeded with url: /test/serving_request/ (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7f52e007ec10>: Failed to establish a new connection: [Errno 111] Connection refused’))


Job ‘raysubmit_ft6xgHREFFRxpxNi’ failed

Status message: Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars):
return request(“post”, url, data=data, json=json, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/requests/api.py”, line 59, in request
return session.request(method=method, url=url, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/requests/sessions.py”, line 589, in request
resp = self.send(prep, **send_kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/requests/sessions.py”, line 703, in send
r = adapter.send(request, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/requests/adapters.py”, line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=‘127.0.0.1’, port=8000): Max retries exceeded with url: /test/serving_request/ (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7f52e007ec10>: Failed to establish a new connection: [Errno 111] Connection refused’))

I was trying multiple things for the past 3 to 4 days but i am unable to get the issue.

NOTE: i have max_ongoing_request=30
target_ongoing_request=2
cluster is autoscaling enabled.

I see request is reaching the replica and replica is able to process it but clients is getting disconnected at ray.get
so request is being cancelled in the ray cluster.

As per the documentation,Set Up FastAPI and HTTP — Ray 2.40.0 i set the asyncio.sheild and try and except block as well but still it is facing issue.Please help as i am currently stuck

ERROR:

I see this error in proxy logs.

Replica(id=‘awlg4nty’, deployment=‘PredictionActor’, app=‘default’) within 0.1s. If this happens repeatedly it’s likely caused by high network latency in the cluster. You can configure the deadline using the RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S environment variable.

INFO 2025-01-02 19:33:54,343 proxy 10.200.2.194 203dd7e9-9459-4c21-aa4b-469518ded1e5 – Client for request 203dd7e9-9459-4c21-aa4b-469518ded1e5 disconnected, cancelling request.

Can someone tell me how to set this variable RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S and what is the recommended value?