Ray serve on Kubernetes

I have coded a pipeline of multiple ray serve deployments which run a Deep Learning Model on the user input and respond with the output of the Model. So firstly there are multiple standalone ray serve deployment classes which run one model on the input (these endpoints can also be called by the user). Then there is another deployment class which gets a handle to all standalone modules and runs them, concatenates there output and sends out to the user. I now wanted to move this @serve deployments to K8s, where I can use autoscaling. I am able to start a minikube cluster and launch a ray cluster using the provided helm chart, but now I don’t understand how am I to move the ray serve deployments classes I have to K8s and serve to users. Could someone please direct me to a resource or tutorial for doing this?

Hi @OAfzal you can follow this example in the documentation to deploy on kubernetes:
https://docs.ray.io/en/latest/serve/deployment.html#deploying-on-kubernetes

So I was able to get the above to work. I wanted to know that is it necessary to initialize a ray cluster on already existing k8 cluster to deploy a ray serve application or I could just containerize multiple microservices with ray serve for serving and would that allow the ability for model composition or communication between the containers using handles or not?

I would also like to know that if a script is using a saved_model file to load a model how would that model file be passed to the cluster. I am referring to running a script on the cluster using ray submit. If the file sent with ray submit has a file dependency how should that be catered?

@OAfzal to your first question, if you want to do model composition using ray serve, all of the models (deployments) should be running on the same ray cluster.

To your second question, for production usage I’d recommend baking that saved_model file into your container image and loading it from disc in the deployment constructor. You could also use ray’s working_dir support, but this is more dynamic than building it into the container and therefore there is more potential for failure.

So for every model do I create a container and deploy it using kubectl? Secondly is it necessary to initialize ray clusyer using the helm chart?

You don’t need to create an individual container for each model, you can use the helm chart to create a multi-pod Ray cluster, then deploy Serve on that cluster (the models will run across the different pods).

Do I submit the model Deoloyment script using ray submit (like in the script above). But you mentioned creating a container image with the model in it. How would I do something like this. I am sorry if my questions are a bit basic. I just started working with k8s. So just trying to learn.

Ah, what I was suggesting is to include the model file in your dockerfile (or however you are building your container image). If you aren’t building the image yourself or want to get off the ground quickly, using the working_dir option to runtime_env would be a good option too:
https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#using-local-files

Hi,
So, my situation is similar to the one discussed here.. I have a pipeline, model & a set of dependent files for the pipeline to run. I am trying to deploy this into ray cluster which is installed on kubernetes.

Understanding is that, ‘ray submit’ command would place the file that I pass in argument and runs in head node pod. But, as I described above, I have a set of files along with the model file. So, I see, it is suggested to include the files in dockerfile and deploy as a container separately.

Here are my questions

  • Can I just use python3 as base image or should I be using rayproject as base image (https://hub.docker.com/r/rayproject/ray) ? Is there any doc I can refer?
  • Could you share a sample yaml file to deploy this as separate pod?
  • Would ray still do autoscaling of pods when running as separate pod?

Hi @SivaSankariRamamoort , we’ve recently updated the Ray Kubernetes documentation: Ray on Kubernetes — Ray 3.0.0.dev0 Can you see if it answers your questions? The getting started guide references some example YAML files. Autoscaling will still work as usual.

Hi Archit Kulkarni,

Thanks for taking time to reply. But, the line you have shared is on how to deploy ray cluster into kubernetes. I’m looking for doc to deploy ray serve on kubernetes.

I think the answers to your questions should be the same whether using Ray Serve or any other Ray application, if I’m understanding them correctly. Once your Ray cluster is deployed you can deploy Serve on it in the usual way.

You might also be interested in RayService - KubeRay Docs which you can test out with pip install "ray[serve, default]==2.0.0rc1".

I am facing something along these line. I am creating a static ray cluster and I am trying to deploy multiple independent models with independent routes on a single ray cluster. I want to auto scale not only the worker but also the model replicas according to requests and I am not using Kuberay.

There are not many resources on how can I serve multiple models on a single static cluster.

Check out these docs on deploying multiple applications with Ray Serve: Deploy Multiple Applications — Ray 2.10.0. You can split each independent model into a separate application and run them on a single cluster.