Avoid downtime when updating script dependencies

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Context:
I have a general question on cluster and deployment architecture. My system exposes a few ML models and algorithms through a Ray REST API. I want these models to be always available as Actors to compute and respond whenever a request is received. My plan is to build a Docker image containing the model code, model checkpoints, and all python dependencies. Dependencies and models get updated frequently, so that Docker image must be rebuild and deployed from time to time.

Question:
I wonder what’s the best way to design the cluster config and deployment flow to minimize downtime. My plan is to deploy this in an autoscaling kubernetes cluster, but I am unsure on what’s the best strategy to redeploy Docker images and register Actors that already exist while avoiding downtime? Does Ray support this natively or is there any tricks I’d need to implement on my own?

Many thanks!

Hi @ezorita, welcome to the Ray forums!

I’d recommend trying Ray Serve, Ray’s model serving library. You can wrap your ML models in @serve.deployment decorators, which can convert your ML models into sets of replicated Ray actors.

KubeRay offers a K8s operator that lets you run your Ray Serve deployments on your Kubernetes cluster. The operator offers fault tolerance, high availability, and zero-downtime updates, which should fit your needs. Please feel free to follow up with any questions!

Thanks for your answer! My question applies concretely on the zero-downtime update for the ray k8s operator. I could not find in the documentation how one would proceed to update the underlying docker image without downtime, since kuberay does not expose a deployment.

You can indeed update the underlying Docker image without downtime.

When you update your RayService config’s rayClusterConfig with new Docker images, the RayService operator will spin up a new Ray cluster with the new Docker images. Meanwhile, it’ll continue serving traffic using the existing Ray cluster. Once your Serve deployments are ready on the new Ray cluster, the operator will switch traffic to the new cluster and then delete the old cluster. There should be no downtime.