Ray serve on Kubernetes

I have coded a pipeline of multiple ray serve deployments which run a Deep Learning Model on the user input and respond with the output of the Model. So firstly there are multiple standalone ray serve deployment classes which run one model on the input (these endpoints can also be called by the user). Then there is another deployment class which gets a handle to all standalone modules and runs them, concatenates there output and sends out to the user. I now wanted to move this @serve deployments to K8s, where I can use autoscaling. I am able to start a minikube cluster and launch a ray cluster using the provided helm chart, but now I don’t understand how am I to move the ray serve deployments classes I have to K8s and serve to users. Could someone please direct me to a resource or tutorial for doing this?

Hi @OAfzal you can follow this example in the documentation to deploy on kubernetes:
https://docs.ray.io/en/latest/serve/deployment.html#deploying-on-kubernetes

So I was able to get the above to work. I wanted to know that is it necessary to initialize a ray cluster on already existing k8 cluster to deploy a ray serve application or I could just containerize multiple microservices with ray serve for serving and would that allow the ability for model composition or communication between the containers using handles or not?

I would also like to know that if a script is using a saved_model file to load a model how would that model file be passed to the cluster. I am referring to running a script on the cluster using ray submit. If the file sent with ray submit has a file dependency how should that be catered?

@OAfzal to your first question, if you want to do model composition using ray serve, all of the models (deployments) should be running on the same ray cluster.

To your second question, for production usage I’d recommend baking that saved_model file into your container image and loading it from disc in the deployment constructor. You could also use ray’s working_dir support, but this is more dynamic than building it into the container and therefore there is more potential for failure.

So for every model do I create a container and deploy it using kubectl? Secondly is it necessary to initialize ray clusyer using the helm chart?

You don’t need to create an individual container for each model, you can use the helm chart to create a multi-pod Ray cluster, then deploy Serve on that cluster (the models will run across the different pods).

Do I submit the model Deoloyment script using ray submit (like in the script above). But you mentioned creating a container image with the model in it. How would I do something like this. I am sorry if my questions are a bit basic. I just started working with k8s. So just trying to learn.

Ah, what I was suggesting is to include the model file in your dockerfile (or however you are building your container image). If you aren’t building the image yourself or want to get off the ground quickly, using the working_dir option to runtime_env would be a good option too:
https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#using-local-files