- High: It blocks me to complete my task.
Issue: Occasionally calls to .remote() hang much longer than normal.
Hi before I make an issue on github id like if someone could sanity check my setup.
I have three services in my architecture.
1.) My own FastApi server.
2.) A Request pre-processing Ray Deployment.
3.) A Compute Ray Deployment.
The FastApi server is my own custom ingress endpoint, not using Ray’s ingress functionality. It connects to an existing Ray cluster via ray.init() on initialization. When the async request endpoint is hit, it uses serve.get_app_handle("RequestApplication").remote(request)
to send the request to the Request Deployment.
The Request deployment is on the Ray head node and based on the request received, sends the request to the correct compute deployment like serve.get_app_handle(<compute app name>).remote(request)
. This happens in the deployment’s async __call__
method.
The Compute deployment is on a different ray node on a different machine than the FastApi server and Ray head node. The Compute deployment receives the request also on its async __call__
method, executes the compute heavy request, and handles sending the data back to the FastApi server.
Things to note:
- Neither the FastApi server nor the Request Deployment wait for the response from
.remote(request)
as the Compute deployment handles the result separate from Ray. - The request is a pydantic object which contains a large object as one of its attributes. In the FastApi server, it calls
ray.put
on that attribute, which then the Compute deployment callsray.get
- In the logs for for the Request deployment, I consistently see “LongPollClient polling timed out. Retrying.” Not sure if this is a problem.
- In the FastApi server I sometimes see: “WARNING 2024-09-24 09:51:14,270 serve 20 pow_2_scheduler.py:536 - Failed to get queue length from Replica(id=‘was0z9gr’, deployment=‘ModelDeployment’, app=‘Model:nnsight-models-languagemodel-languagemodel-repo-id-eleutherai-gpt-j-6b’) within 1.0s. If this happens repeatedly it’s likely caused by high network latency in the cluster. You can configure the deadline using the
RAY_SERVE_QUEUE_LENGTH_RESPONSE_DEADLINE_S
environment variable.” - This is often followed by: “concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x7f118c190910 state=cancelled>”
Is there anything I need to do as I’m not waiting for the results of requests to .remote()? Should I be “closing” the DeploymentHandles in some way after I’ve sent data via their .remote() method?
Sorry for the long post. Please let me know if theres any logs or information I can provide.