We are facing a weird issue where the raycluster is killed and reprovision if the rayservice is not completely up. Th worker nodes are failing health probe. we are getting the below error
Any help will be appreciated!!
We are facing a weird issue where the raycluster is killed and reprovision if the rayservice is not completely up. Th worker nodes are failing health probe. we are getting the below error
Any help will be appreciated!!
Starting from Ray 2.8 and KubeRay v1.1.0, only worker pods running a Ray Serve replica will have a ProxyActor and be marked. Can you let me know what versions you’re running?
You can also inspect the events and logs for more details on which pods are unready and why here: ray/doc/source/cluster/kubernetes/user-guides/rayservice-no-ray-serve-replica.md at releases/2.47.1 · ray-project/ray · GitHub
Might be helpful in trying to figure it out if you can give me some more logs! Thank you!