Batching doesn't work: requests are processed one by one

Hi, I’m experimenting serve.batch:

First of all, I start a local cluster by ray start --head. Then I launch serve by ‘serve start’. Next, I deploy below simple code to the serve:
import time
from typing import List
from pydantic import BaseModel
from ray import serve
from starlette.requests import Request

class PricingRequest(BaseModel):
valuation_date: str
pid: str
payoff: dict
cmm: dict

class PricingApi:
async def present_value(self, reqs: List[Request]):
print(f’batched {len(reqs)} reqs’)
data = []
for r in reqs:
p = await r.json()
return [{‘pid’: r[‘pid’], ‘pv’: 0.0} for r in data]

async def __call__(self, req: Request):
    return await self.present_value(req)

So far, everything works fine and I’m able to call the api by curl post. Then I start a Java application that sends post requests in parallel. However, from the server, I saw all logs like ‘batched 1 reqs’. I understand serve.batch is opportunistic but I would assume above example should work.

I’m using latest nightly build and python 3.8.


Hi! If your request returns instantly there might not be time to accumulate a batch of requests. Does it behave as expected if you add a time.sleep in the request to simulate a time-intensive computation? Alternatively, you could set @serve.batch(batch_wait_timeout_s=10) which would wait 10 seconds to accumulate a batch before processing. See also Batching Tutorial — Ray v2.0.0.dev0

Thanks Archit, it works. I will continue my prototyping :slight_smile: