I wrote a simple Ray serve program with batching.
app = FastAPI()
@serve.deployment
@serve.ingress(app)
class Server:
def __init__(self):
self._model = SentenceTransformer(model_id)
@app.get("/vectors", response_model=VectorResponse)
async def get_vectors(self, text: str) -> VectorResponse:
vector = await self.encode_batched(text)
return VectorResponse(text=text, vector=vector, dim=len(vector))
@serve.batch(max_batch_size=max_batch_size, batch_wait_timeout_s=batch_wait_timeout_s)
async def encode_batched(self, texts: list[str]) -> list[np.ndarray]:
results = []
for text in texts:
results.append(self._model.encode(text).tolist())
print("encode_batched, results", len(results))
return results
server = Server.bind()
Things worked great if all the requests are finished successfully. If I manually killed the client, the server would hang and won’t accept any future requests. I checked the Ray dashboard. All the Server tasks are either finished or failed. There is no running tasks. So, I don’t understand why things got stuck. For batching, do I need to do anything special when a request is cancelled by a client?
I struggled with this issue in the past 8 hours and appreciate any help.