Hi everyone,
I’m currently diving into Ray and particularly intrigued by the concept of dynamic request batching in Ray Serve. I’ve been reviewing the Dynamic Request Batching (Advanced Guides), which has been super helpful.
From what I understand, the default behavior seems to be that each individual request in a batch is held until the entire batch has been processed. I’m curious:
Is there a way to override or work around this behavior so that a request can return its result as soon as it’s ready, even if other requests in the batch are still being processed?
Any insights, best practices, or workarounds would be greatly appreciated!
Thanks in advance.