Dynamic request batching: partial response streaming

Hi everyone,

I’m currently diving into Ray and particularly intrigued by the concept of dynamic request batching in Ray Serve. I’ve been reviewing the Dynamic Request Batching (Advanced Guides), which has been super helpful.

From what I understand, the default behavior seems to be that each individual request in a batch is held until the entire batch has been processed. I’m curious:

Is there a way to override or work around this behavior so that a request can return its result as soon as it’s ready, even if other requests in the batch are still being processed?

Any insights, best practices, or workarounds would be greatly appreciated!

Thanks in advance.

Hi! yes, I think this is possible, this is called streaming the output. You can read more about how to implement it here: Set Up FastAPI and HTTP — Ray 2.47.1

In the guide you linked, they also talk about streaming a bit: Dynamic Request Batching — Ray 2.47.1

When serve.batch -decorated functions return streaming generators over HTTP, this action allows the end client’s connection to terminate once its call is done, instead of waiting until the entire batch is done.