Hi Guys I am trying to get ray(in one kubernetes pod) to connect to a head ray cluster in another (pod).....I also have a backend that I am trying to setup using a client. I am not sure what I am doing wrong but here is the code I currently have. The ray cluster head app is exposed using a service. Should I be using ray.int in my current pod or should it be ray.client.connect? Once I get the client to connect do I need to start ray.serve inside of the local pod or should it be started in the head pod? Confused about that......And then come the backends. Would someone be able to clarify this? My intent is to have have the function run remotely on the ray cluster that get invoked by client calls into my main web app.
ray.init(num_cpus=12, num_gpus=1)
client = serve.start(detached=True, http_host="0.0.0.0", http_port=8000)
client.create_backend("composite_check",
CompositeCheck,
config={"max_concurrent_queries": None,
"num_replicas": 2})
client.create_endpoint("composite",
backend="composite_check",
route="/composite",
methods=["POST"])
I am not sure if this pattern is right. Basically I want the load balancer to hit my proprietary app that is serving the backend which will then run the compute work on the ray cluster.
Or what I have understood so far is that I need to start the ray.serve on the head node and the endpoint also on the head node. So I wont need the proprietary app serving endpoints to ML models.
I have my application which has the above code in a pod. The ray cluster is also running on the same k8s cluster. My app is exposed using a service. The ray cluster head node is also exposed via service. When the app gets a call it runs the ML compute functions on the ray cluster. Not sure if this is the right approach.
Thanks.