LLM Deployment retries

Medha · January 28, 2025, 6:02am

I’m currently working on a project using Ray Serve for LLM Depployment and I’m interested in setting a limit on the number of retries for deployments that fail, I’m unable to find a direct method to restrict how many times a deployment should retry upon failure.

Could anyone provide insights or best practices on how to implement retry limits for deployment failures in Ray Serve? Are there any patterns or custom logic that you recommend integrating into the deployment code to achieve this?

christina · January 28, 2025, 8:18pm

Hi Medha! I took a quick look through our docs and I wasn’t able to find documentation for retrys on deployments for Ray Serve. I think the general idea is to use custom retry logic in your application / client code, or alternatively you can look into using Ray actors + Ray tasks, which both have some sort of retry logic.

More reading here:

Let me know if this helped at all or if you have any other questions

Medha · January 29, 2025, 1:02pm

Thank you for clarifying! That makes sense—I’ll look into implementing custom retry logic within our client code to handle failures more gracefully. I’ll also explore Ray actors and Ray tasks, since their built-in retry capabilities could simplify some of the heavy lifting. Really appreciate the insight!

Topic		Replies	Views
Retries for deployments Ray Client	0	27	November 28, 2024
Fault tolerancy of deployments Ray Serve	1	532	April 13, 2023
Best practice for custom actor recovery Ray Core	1	342	May 23, 2022
[Data] How to limit the number of retries from system failures for dataset.map? Ray Data	3	75	November 1, 2024
Newbi Question: Worker Fault Tolerance?	4	561	February 28, 2022

LLM Deployment retries

Related topics