Highly available head node?

hamlinkn · March 29, 2021, 5:28pm

Is it possible to run multiple head nodes with the Kubernetes Cluster Launcher? If not what would be the recommended approach to scaling a head node to increasing demands? My initial thought was a VerticalPodAutoscaler.

Dmitri · March 29, 2021, 5:58pm

The general direction we’re thinking of for Ray on Kubernetes is to not have a privileged head node. Instead, deploy control processes that Ray has traditionally been running on the head node separately.
Then, we need to solve the problem of HA for these control processes. In particular, the Ray GCS is not currently HA. If that breaks, the cluster needs to restart.

hamlinkn · March 29, 2021, 6:01pm

Got it. HA is not currently supported but it’s the direction the Ray team is headed in.

@Dmitri and just to be clear the changes you’ve mentioned are slated only for the operator, not for the cluster launcher?

Dmitri · March 29, 2021, 6:36pm

HA for Ray components themselves will affect all deployment strategies.

Changes to deployment strategy of Ray components are indeed slated mostly for the operator.
I think it could be possible to adapt the cluster launcher to work with the same general architecture as the operator but without a CRD, though that’s not currently planned.

Hopefully Terraform will include support for CRDs in the not-too-distanct future.

Topic		Replies	Views
Ray on k8s, how to properly config head node Ray Clusters	4	1042	June 24, 2022
Some questions about Ray on Kubernetes Ray Clusters	3	827	December 3, 2021
How to set Ray head node in high availability mode using KubeRay Helm chart? Kubernetes	0	75	February 26, 2025
Multiple head nodes on kubernetes Kubernetes	2	940	February 25, 2021
Autoscaler does not seem to watch head node Kubernetes	5	763	March 26, 2021

Highly available head node?

Related topics