Avoid downtime when updating script dependencies

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have a general question on cluster and deployment architecture. My system exposes a few ML models and algorithms through a Ray REST API. I want these models to be always available as Actors to compute and respond whenever a request is received. My plan is to build a Docker image containing the model code, model checkpoints, and all python dependencies. Dependencies and models get updated frequently, so that Docker image must be rebuild and deployed from time to time.

I wonder what’s the best way to design the cluster config and deployment flow to minimize downtime. My plan is to deploy this in an autoscaling kubernetes cluster, but I am unsure on what’s the best strategy to redeploy Docker images and register Actors that already exist while avoiding downtime? Does Ray support this natively or is there any tricks I’d need to implement on my own?

Many thanks!

Hi @ezorita, welcome to the Ray forums!

I’d recommend trying Ray Serve, Ray’s model serving library. You can wrap your ML models in @serve.deployment decorators, which can convert your ML models into sets of replicated Ray actors.

KubeRay offers a K8s operator that lets you run your Ray Serve deployments on your Kubernetes cluster. The operator offers fault tolerance, high availability, and zero-downtime updates, which should fit your needs. Please feel free to follow up with any questions!