I have configured the cluster via manual
ray start in docker containers on remote servers. I have the head node in k8s with the same startup principle (
But when the head node is refreshed all information about running applications is overwritten and applications are not restored.
Is it possible to configure cluster fault tolerance without
Is it possible to fix this without using KubeRay and
If not, what exactly is the difference between KubeRay and regular k8s? Maybe it is possible to customize it without KubeRay directives.
How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
- python == 3.11.5
- ray == 2.8.1