Dynamic request batching: partial response streaming

jds574 · July 8, 2025, 1:59pm

Hi everyone,

I’m currently diving into Ray and particularly intrigued by the concept of dynamic request batching in Ray Serve. I’ve been reviewing the Dynamic Request Batching (Advanced Guides), which has been super helpful.

From what I understand, the default behavior seems to be that each individual request in a batch is held until the entire batch has been processed. I’m curious:

Is there a way to override or work around this behavior so that a request can return its result as soon as it’s ready, even if other requests in the batch are still being processed?

Any insights, best practices, or workarounds would be greatly appreciated!

Thanks in advance.

christina · July 8, 2025, 11:49pm

Hi! yes, I think this is possible, this is called streaming the output. You can read more about how to implement it here: Set Up FastAPI and HTTP — Ray 2.47.1

In the guide you linked, they also talk about streaming a bit: Dynamic Request Batching — Ray 2.47.1

When serve.batch -decorated functions return streaming generators over HTTP, this action allows the end client’s connection to terminate once its call is done, instead of waiting until the entire batch is done.

Topic		Replies	Views
Batching doesn't work: requests are processed one by one Ray Serve	2	602	June 19, 2021
Batching when using non python client Ray Serve	1	414	March 24, 2021
Ray Serve get Header / Dynamic Batching with FastAPI Ray Serve	2	975	October 16, 2023
How to View Results of Post Request with Ray Serve Batching? Ray Serve	1	392	February 7, 2022
Multiplexing and batching go together? Ray Workflows	2	30	June 10, 2025

Dynamic request batching: partial response streaming

Related topics