[Cluster, Serve] Is it possible to configure cluster fault tolerance without `ray up`?

psydok · January 11, 2024, 6:42am

I have configured the cluster via manual ray start in docker containers on remote servers. I have the head node in k8s with the same startup principle (ray start).
But when the head node is refreshed all information about running applications is overwritten and applications are not restored.

Is it possible to configure cluster fault tolerance without ray up?
Is it possible to fix this without using KubeRay and ray up?
If not, what exactly is the difference between KubeRay and regular k8s? Maybe it is possible to customize it without KubeRay directives.

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Versions

python == 3.11.5
ray == 2.8.1

Topic		Replies	Views
Unable to recover from head-pod failure in k8s Ray Clusters	8	828	March 22, 2022
Kubernetes cluster only creates head node Ray Clusters	11	786	June 7, 2022
Automatically restart head node on kubernetes Kubernetes	3	853	June 24, 2021
Connecting to remote Ray cluster on K8s Ray Clusters	7	2721	September 6, 2022
Autoscaler does not seem to watch head node Kubernetes	5	736	March 26, 2021

[Cluster, Serve] Is it possible to configure cluster fault tolerance without `ray up`?

Related topics