Head pod does not restart after deleting/draining

Hi there,

I am deploying Ray on K8s using Helm Chart, the environment and the values.yaml shows below:

  • ray == 1.12.0
  • runing ray on K8s(GKE)
    • 1.21.10-gke.2000
# `values.yaml`
image: rayproject/ray:1.12.0-py38
podTypes:
  rayHeadType
    CPU: 4
    memory: 30Gi
    GPU: 0
    rayResources: { "CPU": 0 }
  rayWorkerType:
    minWorkers: 0
    maxWorkers: 6
    memory: 30Gi
    CPU: 3
    GPU: 0

For any reason, when the head pod was deleted or the node was drained where the head pod lived, the head pod will never be created again.

Now, my workaround is slightly to modify the spec of rayHeadType forcing to restart the Ray cluster(Ray Operator Advanced Configuration — Ray 1.12.1). For instance, I changed the CPU of rayHeadType to 3 and changed it back to 4 again.

Any tips for this situation?