Yes, your issue is likely due to the Ray Serve deployment taking too long to initialize (e.g., downloading large models), causing the Kubernetes liveness/readiness probes to fail and the pods to restart. This is a common problem when loading large LLMs, as the default probe timeouts may be too short for the model download and initialization process to complete. According to RayService troubleshooting docs, if the application takes too long to become ready, Kubernetes may repeatedly restart the pods, preventing successful deployment.
To resolve this, you should increase the initialDelaySeconds and timeoutSeconds for the liveness and readiness probes in your pod spec, giving the model enough time to download and initialize. This is not a Ray-specific issue but a Kubernetes configuration detail. Also, ensure your cluster has enough resources and network bandwidth for model downloads, as insufficient resources can also cause repeated restarts (Discourse: deployment init takes too long).
Would you like more detail on how to adjust your Kubernetes YAML to fix this?
Sources:
Hint: Mention @RunLLM in the post for followups.