Highly available head node?

Is it possible to run multiple head nodes with the Kubernetes Cluster Launcher? If not what would be the recommended approach to scaling a head node to increasing demands? My initial thought was a VerticalPodAutoscaler.

1 Like

The general direction we’re thinking of for Ray on Kubernetes is to not have a privileged head node. Instead, deploy control processes that Ray has traditionally been running on the head node separately.
Then, we need to solve the problem of HA for these control processes. In particular, the Ray GCS is not currently HA. If that breaks, the cluster needs to restart.

Got it. HA is not currently supported but it’s the direction the Ray team is headed in.

@Dmitri and just to be clear the changes you’ve mentioned are slated only for the operator, not for the cluster launcher?

1 Like

HA for Ray components themselves will affect all deployment strategies.

Changes to deployment strategy of Ray components are indeed slated mostly for the operator.
I think it could be possible to adapt the cluster launcher to work with the same general architecture as the operator but without a CRD, though that’s not currently planned.

Hopefully Terraform will include support for CRDs in the not-too-distanct future.