Ray pods keeps restarting

RunLLM · October 26, 2025, 8:16am

Yes, your issue is likely due to the Ray Serve deployment taking too long to initialize (e.g., downloading large models), causing the Kubernetes liveness/readiness probes to fail and the pods to restart. This is a common problem when loading large LLMs, as the default probe timeouts may be too short for the model download and initialization process to complete. According to RayService troubleshooting docs, if the application takes too long to become ready, Kubernetes may repeatedly restart the pods, preventing successful deployment.

To resolve this, you should increase the initialDelaySeconds and timeoutSeconds for the liveness and readiness probes in your pod spec, giving the model enough time to download and initialize. This is not a Ray-specific issue but a Kubernetes configuration detail. Also, ensure your cluster has enough resources and network bandwidth for model downloads, as insufficient resources can also cause repeated restarts (Discourse: deployment init takes too long).

Would you like more detail on how to adjust your Kubernetes YAML to fix this?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Multiple deployments on Ray service gets stuck on pending placement group	1	55	October 31, 2025
Autoscaling RayServe Pods in k8s keeps terminating and restarting pods Ray Serve	4	797	November 20, 2023
[Medium] Using docker image for service deployment Ray Serve	7	1012	December 29, 2023
Kuberay sample RayService not launching serve apps Ray Serve	11	1025	September 10, 2024
Running 10+ models on a ray cluster Kubernetes	1	597	February 27, 2022

Ray pods keeps restarting

Related topics