Ray head pods are getting stuck in ContainerStatusUnknown state

Ritesh_K · September 27, 2024, 5:42am

I am deploying a service using ray serve in my EKS cluster. And there are 6 different ray services running in the cluster.

I am seeing that sometimes the ray serve deployment head is going into ContainerStatusUnknown state, and it’s only occurring to one or two pod at a time. other ray serve head pods use to run fine.

Other thing worrying me that why kuberay is not re-creating the pod if the service is not able to run for a long time. Below are the logs from kuberay
{"level":"info","ts":"2024-09-27T05:42:28.322Z","logger":"controllers.RayService","msg":"Check the head Pod status of the pending RayCluster","RayService":{"name":"roberta-bert","namespace":"ray-serve"},"reconcileID":"38c2c064-6481-4d9e-8bf2-1e6acb0a4b22","RayCluster name":"roberta-bert-raycluster-mpdpm"}
{"level":"info","ts":"2024-09-27T05:42:28.322Z","logger":"controllers.RayService","msg":"Skipping the update of Serve deployments because the Ray head Pod is not ready.","RayService":{"name":"roberta-bert","namespace":"ray-serve"},"reconcileID":"38c2c064-6481-4d9e-8bf2-1e6acb0a4b22"}

Ray serve version: 2.35
Kuberay version 1.1.0

cindy_zhang · October 4, 2024, 4:42pm

Hi @Ritesh_K, searching about ContainerStatusUnknown online shows that it’s often connected with running out of ephemeral storage or OOM issues. Could you check if that’s the case here, by getting the output of kubectl describe pod and pasting it here?

Topic		Replies	Views
Head pod stuck on pulling the image Ray Clusters	1	35	July 29, 2024
[Serve] RayServe Pods Stuck in Unready State Causing API Outages Kubernetes	0	5	May 27, 2025
Ray Worker pod stuck at init stage and unable to be created Ray Clusters	8	605	August 7, 2024
Kuberay sample RayService not launching serve apps Ray Serve	11	781	September 10, 2024
How to check ray serve deployment log from k8s? Ray Clusters	1	478	September 14, 2022

Ray head pods are getting stuck in ContainerStatusUnknown state

Related topics