How severe does this issue affect your experience of using Ray?
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I have a general question on cluster and deployment architecture. My system exposes a few ML models and algorithms through a Ray REST API. I want these models to be always available as Actors to compute and respond whenever a request is received. My plan is to build a Docker image containing the model code, model checkpoints, and all python dependencies. Dependencies and models get updated frequently, so that Docker image must be rebuild and deployed from time to time.
I wonder what’s the best way to design the cluster config and deployment flow to minimize downtime. My plan is to deploy this in an autoscaling kubernetes cluster, but I am unsure on what’s the best strategy to redeploy Docker images and register Actors that already exist while avoiding downtime? Does Ray support this natively or is there any tricks I’d need to implement on my own?
Thanks for your answer! My question applies concretely on the zero-downtime update for the ray k8s operator. I could not find in the documentation how one would proceed to update the underlying docker image without downtime, since kuberay does not expose a deployment.
You can indeed update the underlying Docker image without downtime.
When you update your RayService config’s rayClusterConfig with new Docker images, the RayService operator will spin up a new Ray cluster with the new Docker images. Meanwhile, it’ll continue serving traffic using the existing Ray cluster. Once your Serve deployments are ready on the new Ray cluster, the operator will switch traffic to the new cluster and then delete the old cluster. There should be no downtime.