Serving Ray on Kubernetes from Another App

Hi Guys I am trying to get ray(in one kubernetes pod) to connect to a head ray cluster in another (pod).....I also have a backend that I am trying to setup using a client. I am not sure what I am doing wrong but here is the code I currently have. The ray cluster head app is exposed using a service. Should I be using in my current pod or should it be ray.client.connect? Once I get the client to connect do I need to start ray.serve inside of the local pod or should it be started in the head pod? Confused about that......And then come the backends. Would someone be able to clarify this? My intent is to have have the function run remotely on the ray cluster that get invoked by client calls into my main web app.

ray.init(num_cpus=12, num_gpus=1)
client = serve.start(detached=True, http_host="", http_port=8000)
                      config={"max_concurrent_queries": None,
                              "num_replicas": 2})

I am not sure if this pattern is right. Basically I want the load balancer to hit my proprietary app that is serving the backend which will then run the compute work on the ray cluster.

Or what I have understood so far is that I need to start the ray.serve on the head node and the endpoint also on the head node. So I wont need the proprietary app serving endpoints to ML models.

I have my application which has the above code in a pod. The ray cluster is also running on the same k8s cluster. My app is exposed using a service. The ray cluster head node is also exposed via service. When the app gets a call it runs the ML compute functions on the ray cluster. Not sure if this is the right approach.


Do I understand your setup correct?
Untitled Diagram

Ray Head

  • Has been started with ray.init(namespace="serve")
  • Has the address HOST_ADDRESS

Ray App

  • Needs to be started with ray.init(address=HOST_ADDRESS, namespace="serve")
  • start ray serve: serve.start(detached=True, http_options={'location'="EveryNode"}) see [Calling Deployments via HTTP and Python — Ray v2.0.0.dev0](https://Server Location Doc), if you do not provide this the HTTP Server will only be on your Head node
  • run code to start your backends
  • make sure they are able to communicate via the requried ports (not so sure how that works in kubernetes, with pure docker you need to open the ports…)


If the backens should only execute on the Head Node, you need to make sure they will be deployed only there. So make sure you setup the ressources properly when starting the cluster. I am not 100% sure but I think the HTTPServer requires a CPU to be able to start up, but maybe that has changed… However I would just use custom resources when initializing the Ray Head node and add this custom resources to the serve backends, this makes sure the code will be executed on the Head Node only Custom Resources

Which ray version are you using, the answer I added is for the ray 1.4.1 version (not sure since when namespaces are supported)

I am using version 1.5 but I can downgrade to 1.4
The ray app has the server, the backend and the actors. Yes the way you have drawn it is correct. There also is the Ray workers nodes running in separate pods. The ray head pod is started by the Ray Operator. I did not do anything special for that. This HOST_ADDRESS is the address of the container of my specific app right?

1.5 is fine, however you are using the deprecated way of launching your serve backend. HOST_ADDRESS should be the address of your head node, you need to tell ray where the cluster head is located

Thank you Tanja. Got it working.

1 Like