I’m currently working on a project using Ray Serve for LLM Depployment and I’m interested in setting a limit on the number of retries for deployments that fail, I’m unable to find a direct method to restrict how many times a deployment should retry upon failure.
Could anyone provide insights or best practices on how to implement retry limits for deployment failures in Ray Serve? Are there any patterns or custom logic that you recommend integrating into the deployment code to achieve this?
Hi Medha! I took a quick look through our docs and I wasn’t able to find documentation for retrys on deployments for Ray Serve. I think the general idea is to use custom retry logic in your application / client code, or alternatively you can look into using Ray actors + Ray tasks, which both have some sort of retry logic.
Thank you for clarifying! That makes sense—I’ll look into implementing custom retry logic within our client code to handle failures more gracefully. I’ll also explore Ray actors and Ray tasks, since their built-in retry capabilities could simplify some of the heavy lifting. Really appreciate the insight!