[Serve] Batch inference

Kai-Hsun_Chen · March 4, 2023, 1:03am

I followed the batch inference tutorial. The only difference is batch_wait_timeout_s. I follow the example and use 9 Ray tasks to send requests to the serve deployment. However, it always prints “Our input array has length: 1” no matter the value of batch_wait_timeout_s (0, 1, 10). Is it expected?

from typing import List

from starlette.requests import Request
from transformers import pipeline

from ray import serve

@serve.deployment
class BatchTextGenerator:
    def __init__(self, pipeline_key: str, model_key: str):
        self.model = pipeline(pipeline_key, model_key)

    @serve.batch(max_batch_size=4, batch_wait_timeout_s=10)
    async def handle_batch(self, inputs: List[str]) -> List[str]:
        print("Our input array has length:", len(inputs))

        results = self.model(inputs)
        return [result[0]["generated_text"] for result in results]

    async def __call__(self, request: Request) -> List[str]:
        return await self.handle_batch(request.query_params["text"])
    
generator = BatchTextGenerator.bind("text-generation", "gpt2")

I ran the example on a RayCluster with the image rayproject/ray=ml:2.3.0 with KubeRay 0.4.0.

Faithful_Ng · December 6, 2023, 3:35am

Hi, Have you found issues? I have the same problem and I don’t why.

Topic		Replies	Views
Serve batching not working	0	193	December 6, 2023
Hanging issue with serve.batch	2	364	December 22, 2023
How to View Results of Post Request with Ray Serve Batching? Ray Serve	1	392	February 7, 2022
How to post data to dynamic batch directly？ Ray Serve	1	46	October 24, 2024
Issues with Batch Overflow during exceptions while utilizing map_batches Ray Data	1	29	August 22, 2024

[Serve] Batch inference

Related topics