Best practise for taking down Ray cluster deployments on Kubernetes

ecm200 · May 11, 2022, 2:56pm

I am experimenting with deployment of Ray Clusters onto a Kubernetes cluster, following the documents to startup an operator and then individual ray clusters.

The setup process is fine, and I have managed to get 3 separate Ray clusters happily co-existing in the same namespace and working as expected.

My question relates to the process of taking down a given Ray Cluster, as I have been getting irregular behaviour with resources not deleting.

What are the best order of commands to achieve this?

Just to be clear, my intention here is to remove a single cluster from the Kubernetes service, whilst keeping up the others.

I understand that to completely take down Ray, I need to uninstall the operator after I have uninstalled any running Ray cluster instances.

This is the current order in which I execute commands:

# Delete a load balancer service for access to head node on private VNET
kubectl -n ray-clusters delete service ray1-cluster-head-access

# Delete the custom resource for this cluster
kubectl -n ray-clusters delete RayCluster ray1-cluster

# Uninstall the helm chart for this cluster
helm -n ray-clusters uninstall ray1-cluster

Dmitri · May 12, 2022, 3:08am

Copying the relevant discussion from Ray Slack for record-keeping purposes!

If using Helm to deploy the operator following the current OSS docs, the recommendation is to install the operator and Ray clusters in individual releases.
Ray cluster releases should be uninstalled before the operator release is uninstalled.
https://docs.ray.io/en/latest/cluster/kubernetes-advanced.html#running-multiple-ray-clusters
https://docs.ray.io/en/latest/cluster/kubernetes-advanced.html#cleaning-up-resources

It’s possible there are some bugs blocking deletion even when following the procedure in the docs.
I would recommend looking into deploying with KubeRay, which is overall more stable.

Topic		Replies	Views
Starting and stopping Ray clusters on Kubernetes fails Kubernetes	5	1579	February 16, 2022
Head pod does not restart after deleting/draining Kubernetes	7	796	August 9, 2022
Is there a way to stop or delete the head node once the job is done? Ray Clusters	5	2103	June 15, 2022
KubeRay operator keep restarting Kubernetes	13	2791	October 7, 2022
About the Kubernetes category Kubernetes	0	668	January 27, 2021

Best practise for taking down Ray cluster deployments on Kubernetes

Related topics