Ray Serve for production usecase

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I find the recommended way to use Ray Serve for production environment in documentation is to install all dependencies and files to ray base image and use the new image for head and worker node.
My questions around this topic are:

  1. Which path should I COPY all relative files into the image? Is there any other way to activate the Ray Serve deployment? (I have tried to use kubectl -n ray exec <head node pod> -- python script.py, but this could be done only after Ray cluster is up, so I’m wondering if the deployment could be activated when starting Ray cluster?)
  2. I tried another way to use ray submit with remote url from github to activate the Ray Serve deployment, it seems the RAM will be greatly increased, if I update the code or ml model file in github and submit the same deployment again, will the RAM used for old code be released or not?
  3. What is the best way to make Ray Serve deployment alive under the situation of periodical update of code? If insert all file into image, we need to restart Ray cluster through helm and all other deployments need to be also redeployed, if using ray submit, maybe RAM usage would be bad.
    I’m looking forward to your reply. Thanks in advance!

Hi @Rui, welcome to the forum!

Here’s my thoughts on your questions–

  1. Generally, people manage their Ray clusters using Kuberay, which manages Ray clusters on K8s. Then you can use a Python script to launch your deployments on the Ray cluster.
  2. After updating the model, the memory taken up by the old model should be garbage collected by Python as long as nothing is holding a reference to it. You can use the ray memory command to debug memory usage. (Link to documentation)
  3. There’s a couple options. One is to use ray submit as you’re doing, and rely on Ray Serve’s rolling updates to keep deployments alive during the update. Another is to start a separate Ray cluster, deploy updated deployments to it, and then switch over traffic to the new cluster.

Do you mind giving a bit more info about your architecture and workload?

Hi @shrekris , thanks for your reply.
For (1), does it mean using Kuberay is more suitable than using Ray Kubernetes Operator to launch a Ray cluster on an existing K8s? If using Kuberay, the best way is to copy ml model files into ray base image and install all heavy dependencies, then replace the rayproject/ray image with my custom image in ray-cluster.complete.yaml right?
My question is how to launch the deployment when the Python script is already copied into the custom image?

At high level, we have already an AKS to manage different applications, there is one namespace for a machine learning application. About this ml application, we use it to process different computer vision tasks, that means we have heavy dependencies and large ml model files. We have also different use cases, so we need to deploy several Ray Serve deployments. After a period of time, one of the deployment need to be updated.

For (1) that’s correct– the Kuberay is more suitable than the Ray K8s operator. You would layer your ML model files onto the Ray base image and replace the rayproject/ray with this custom image.

You would then need to manually start the Serve deployments using kubectl exec or Ray job submission.