Fault tolerancy of deployments

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi everyone. I am trying to build a fault tolerant app with ray using deployments but I have some questions I cannot answer by reading docs. Why is there no max_task_retries or max_restarts options for deployments? I couldn’t find anything in the docs and it seems like max_task_retries is set to 0 for deployments. Is there way to retry requests to deployments? And what are the advantages of using deployments instead of actors?

Hi @Aleksandr_M , we have retry when tasks failed, it is controlled by RAY_SERVE_HTTP_REQUEST_MAX_RETRIES, more details: Performance Tuning — Ray 2.3.1

For the deployment advantage, you don’t need to take care of autoscaling & lifecycle & health check etc.