Hanging issue with serve.batch

Jingzhao_Ou · December 15, 2023, 7:19am

I wrote a simple Ray serve program with batching.

app = FastAPI()

@serve.deployment
@serve.ingress(app)
class Server:
    def __init__(self):
        self._model = SentenceTransformer(model_id)
    
    @app.get("/vectors", response_model=VectorResponse)
    async def get_vectors(self, text: str) -> VectorResponse:
        vector = await self.encode_batched(text)
        return VectorResponse(text=text, vector=vector, dim=len(vector))

    @serve.batch(max_batch_size=max_batch_size, batch_wait_timeout_s=batch_wait_timeout_s)
    async def encode_batched(self, texts: list[str]) -> list[np.ndarray]:
        results = []
        for text in texts:
            results.append(self._model.encode(text).tolist())
        print("encode_batched, results", len(results))
        return results

server = Server.bind()

Things worked great if all the requests are finished successfully. If I manually killed the client, the server would hang and won’t accept any future requests. I checked the Ray dashboard. All the Server tasks are either finished or failed. There is no running tasks. So, I don’t understand why things got stuck. For batching, do I need to do anything special when a request is cancelled by a client?

I struggled with this issue in the past 8 hours and appreciate any help.

Jingzhao_Ou · December 15, 2023, 6:18pm

I really appreciate any help to my issue. I tried a sample program from the Ray website:

@serve.deployment
class BatchedDeployment:
    @serve.batch(max_batch_size=5, batch_wait_timeout_s=0.1)
    async def batch_handler(self, requests: List[Request]) -> List[str]:
        response_batch = []
        for r in requests:
            name = (await r.json())["name"]
            response_batch.append(f"Hello {name}!")

        await asyncio.sleep(1). # <==== added some delay so that I can cancel the task in the middle
        return response_batch

    async def __call__(self, request: Request):
        return await self.batch_handler(request)

app = BatchedDeployment.bind()

I sent multiple requests to the server and hit “Ctrl-C” to stop the requests randomly. The server hung when I resumed the requests. I saw the following Ray output:

(ProxyActor pid=66328) INFO 2023-12-15 18:08:42,772 proxy 172.21.204.7 222f17ed-7067-4d6f-9a38-c815177f77ef / default proxy.py:1004 - Client for request 222f17ed-7067-4d6f-9a38-c815177f77ef disconnected, cancelling request.

Jingzhao_Ou · December 22, 2023, 5:07am

I finally figured out the root cause of my issue. It is a regression with Ray 2.8.1 itself. See the following note in Ray 2.9.0:

• Fixed issue during batch requests when a request is dropped, the batch loop will be killed and not processed any future requests.

I am quite puzzled why I did not see others running into this issue.

Topic		Replies	Views
Batching doesn't work: requests are processed one by one Ray Serve	2	604	June 19, 2021
Ray Serve with Fast API and Serve batch- Client Request cancellation RLlib	0	68	January 3, 2025
Ray Serve - Client request Cancellation Ray Serve	2	152	March 27, 2025
Scaling up handeled requests when using the batching wrapper Ray Serve	2	40	September 6, 2024
Batching when using non python client Ray Serve	1	414	March 24, 2021

Hanging issue with serve.batch

Related topics