Ray serve requests returning ECONNREFUSED

Hi,

I’m running ray on a Kubernetes cluster using Kuberay operator. Everything is working great, but I’m now stuck with an issue when trying accessing the serve endpoint (from pods in the same Kubernetes cluster).

When using a port forward to ray head (port 8000), I’m able to reach the service on port 8000 and perform the inference with no issues, but when trying to access from other pods with either “ray-cluster-head-svc.ray-workload.svc.cluster.local” or the clusterIP of the service, it gives me the ECONNREFUSED error, only for port 8000.

The service/deployment is running with -p 0.0.0.0.

I’m able to access the dashboard from other pods in the cluster normally, using http://ray-cluster-head-svc.ray-workload.svc.cluster.local::8265 but no luck reaching port 8000.

Any suggestions on what to try next?

I couldn’t find a way to fix it using Ray.

If anyone is facing this issue, the problem seems to be that for some reason the -h 0.0.0.0 is not making the http server available from all network intefaces, just lo, not eth0.

I did a postStart script to use redir to make the binding:

postStart:
              exec:
                command: ["/bin/sh", "-c", "sudo apt-get update && sudo apt-get install redir net-tools && ifconfig eth0 | awk -F ' *|:' '/inet/{print $3\":8000 \" \"127.0.0.1:8000\"}' | xargs redir"]

I’m installing net-tools and redir, but obviously this is not necessary if the ray-head image have it installed.

Pretty hacky way but it’s working.

@geraldoramos thanks for the investigation. Can I ask how did you deploy the serve app through the kuberay operator? Are you using RayService - KubeRay Docs?