Hanging issue with serve.batch

I wrote a simple Ray serve program with batching.

app = FastAPI()

@serve.deployment
@serve.ingress(app)
class Server:
    def __init__(self):
        self._model = SentenceTransformer(model_id)
    
    @app.get("/vectors", response_model=VectorResponse)
    async def get_vectors(self, text: str) -> VectorResponse:
        vector = await self.encode_batched(text)
        return VectorResponse(text=text, vector=vector, dim=len(vector))

    @serve.batch(max_batch_size=max_batch_size, batch_wait_timeout_s=batch_wait_timeout_s)
    async def encode_batched(self, texts: list[str]) -> list[np.ndarray]:
        results = []
        for text in texts:
            results.append(self._model.encode(text).tolist())
        print("encode_batched, results", len(results))
        return results

server = Server.bind()

Things worked great if all the requests are finished successfully. If I manually killed the client, the server would hang and won’t accept any future requests. I checked the Ray dashboard. All the Server tasks are either finished or failed. There is no running tasks. So, I don’t understand why things got stuck. For batching, do I need to do anything special when a request is cancelled by a client?

I struggled with this issue in the past 8 hours and appreciate any help.

I really appreciate any help to my issue. I tried a sample program from the Ray website:

@serve.deployment
class BatchedDeployment:
    @serve.batch(max_batch_size=5, batch_wait_timeout_s=0.1)
    async def batch_handler(self, requests: List[Request]) -> List[str]:
        response_batch = []
        for r in requests:
            name = (await r.json())["name"]
            response_batch.append(f"Hello {name}!")

        await asyncio.sleep(1). # <==== added some delay so that I can cancel the task in the middle
        return response_batch

    async def __call__(self, request: Request):
        return await self.batch_handler(request)

app = BatchedDeployment.bind()

I sent multiple requests to the server and hit “Ctrl-C” to stop the requests randomly. The server hung when I resumed the requests. I saw the following Ray output:

(ProxyActor pid=66328) INFO 2023-12-15 18:08:42,772 proxy 172.21.204.7 222f17ed-7067-4d6f-9a38-c815177f77ef / default proxy.py:1004 - Client for request 222f17ed-7067-4d6f-9a38-c815177f77ef disconnected, cancelling request.

I finally figured out the root cause of my issue. It is a regression with Ray 2.8.1 itself. See the following note in Ray 2.9.0:

Fixed issue during batch requests when a request is dropped, the batch loop will be killed and not processed any future requests.

I am quite puzzled why I did not see others running into this issue.