Help designing fire and forget server for large batch inference

jonaz · November 28, 2023, 10:23am

Well, after posting this here I did some digging and found this thread Job Queue in Ray? - #24 by ericl

Which led me to these RFCs

[RFC] Job queueing functionality with Ray Serve + Workflows · Issue #21161 · ray-project/ray · GitHub
[RFC] Async request support in Ray Serve · Issue #32292 · ray-project/ray · GitHub

And to discovering the existence of Ray Workflows, which could cover my use cases I believe

But anyway, I’d appreciate guidelines on how to use Workflows to design something like that, if anybody has some tips or best practices. I also know it’s still in Alpha.

Excited to see the development of the Async requests on Serve as well!
What I’d suggest is to make Ray Workflows more visible, it took me quite a while to find out this existed, and I think use cases like mine are becoming more and more common (as the second RFC suggests)

Topic		Replies	Views
Workflow calling Deployment.remote()? Ray Workflows	0	262	November 28, 2023
Ray Serve: custom resource optimization Ray Serve	3	360	January 26, 2023
Optimal way to handle for loop with multiple await calls Ray Serve	6	757	June 22, 2022
Ray Serve is executing the requests sequentially instead parallel even after configuring auto-scale Ray Serve	11	335	October 20, 2023
Ray with FastAPI Ray Core	1	214	December 24, 2023

Help designing fire and forget server for large batch inference

Related Topics