Hello, I am running a ray serve with multiple replicas and dynamic batching.
Sometimes my service needs more than 10+ minutes to run and having the client wait for the response is not optimal. Is there a way I can acknowledge the request, process the model in the background, and at the end send the results to a callback URL without having to use FastAPI or similar?
Hi @Maris_Basha, welcome to the forums!
This pattern can be handled using async requests. Serve doesn’t provide this natively yet, but we’re collecting community feedback on this feature through this RFC. Please take a look and chime in with your thoughts!
For now, Ray Serve developers often rely on an additional task queue like Celery to queue up their users’ requests and responses. In these systems, typically the request lifecycle is:
- Submit a request, which gets queued on the task queue. The user gets a response immediately.
- In the background the request gets processed. The result gets placed on a response queue.
- The user polls the server periodically to see if the request has finished. Once it’s finished, the user gets a result.
Apologies for picking up this thread after such a long hiatus.
Just out of curiosity, is there a reason to adopt something like Celery when it seems like a ray task (or the newer “workflow”) would do the trick?
1 Like