Kuberay Canary Deployments

[Medium]

For our production Ray serve application, we’d like to perform gradual rollouts to ensure new code is safe. Currently Kuberay performs a complete cutover after confirming the new deployment is good.

An example strategy would be to deploy 10% of traffic for 1 hour and progress to 30%, etc.

Our current solution is to use an external AWS Load Balancer with Target Group to control traffic percentages with Blue/Green cluster setup.

We’ve also explored Argo Rollouts, however the RayService is a CRD and Rollouts is limited to only handling kubernetes deployments.

Additions to the Kuberay documentation toward more complex deployment strategies would be appreciated as well.

Testing on Ray 2.4.0 and Kuberay nightly

cc @Akshay_Malik @Sihan_Wang

Your solution makes sense. We have some other customers who use ArgoCD successfully, I’m not sure how it differs from Argo Rollouts.

At this point, we don’t have any plans to make changes to the KubeRay deployment strategies.