ConnectionError: ray client connection timeout, Ray 1.9.0, Kubernetes

lihost · December 14, 2021, 11:24pm

I have deployed Ray cluster on kubernetes on local machine and trying to connect to it via another pod(running business logic + models).

I am using ray.init("ray://example-cluster-ray-head:10001", namespace="ray")

and getting below stacktrace

ray.init("ray://example-cluster-ray-head:10001", namespace="ray")
File "/usr/local/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
  return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/ray/worker.py", line 775, in init
  return builder.connect()
File "/usr/local/lib/python3.9/site-packages/ray/client_builder.py", line 151, in connect
  client_info_dict = ray.util.client_connect.connect(
File "/usr/local/lib/python3.9/site-packages/ray/util/client_connect.py", line 33, in connect
  conn = ray.connect(
File "/usr/local/lib/python3.9/site-packages/ray/util/client/__init__.py", line 228, in connect
  conn = self.get_context().connect(*args, **kw_args)
File "/usr/local/lib/python3.9/site-packages/ray/util/client/__init__.py", line 81, in connect
  self.client_worker = Worker(
File "/usr/local/lib/python3.9/site-packages/ray/util/client/worker.py", line 130, in __init__
  self._connect_channel()
File "/usr/local/lib/python3.9/site-packages/ray/util/client/worker.py", line 244, in _connect_channel
  raise ConnectionError("ray client connection timeout")
ConnectionError: ray client connection timeout

jiaodong · December 14, 2021, 11:49pm

Hi @lihost can you provide a full script for producing this issue ? In addition you might be able to reproduce it on your laptop by manually starting ray cluster ray start --head then calls ray.init("ray://127.0.0.1:10001", namespace="ray") with your script.

It looks similar to another P0 issue that we’re addressing now: [Bug] [Serve] Ray hangs on API methods · Issue #20971 · ray-project/ray · GitHub where it can also get stuck and timeout on second iteration of init_ray() in the script.

lihost · December 15, 2021, 6:23am

Hi @jiaodong, thanks for your response.

I ran it manually again and its working fine with ray start --head but while deploying it on kubernetes, its throwing this error.

I have followed steps mentioned at https://docs.ray.io/en/master/cluster/kubernetes.html to setup ray cluster within Kubernetes.

For running ray.server and ray.remote, I am following above mentioned guide’s subsection i.e. using-ray-client-to-connect-from-within-the-kubernetes-cluster.

Hope docs haven’t changed.

jiaodong · December 15, 2021, 6:46am

I see, it’s good to know that should be another category of the issue then and from your description it looks like you’re trying out our latest documentation sample script that is not working as expected. Is my understanding correct or you’re using different script for your use case other than ray/job_example.py at master · ray-project/ray · GitHub ?

cc: @architkulkarni

lihost · December 15, 2021, 6:49am

Yes, thats pretty much what I am trying to do as a POC for now.

jiaodong · December 15, 2021, 6:26pm

@lihost what outputs do you see for executing steps in Deploying on Kubernetes — Ray v2.0.0.dev0, such as

kubectl -n ray get rayclusters

kubectl -n ray get pods

kubectl -n ray get service

kubectl get deployment ray-operator

@Dmitri has most context about our ray on k8s deployment

lihost · December 15, 2021, 9:13pm

Thanks @jiaodong , here is what I can see…

❯ kubectl -n ray get rayclusters
NAME              STATUS    RESTARTS   AGE
example-cluster   Running   0          42h


❯ kubectl -n ray get pods
NAME                                    READY   STATUS    RESTARTS   AGE
example-cluster-ray-head-type-pnkp2     1/1     Running   0          42h
example-cluster-ray-worker-type-5pwgv   1/1     Running   0          42h
example-cluster-ray-worker-type-x2l54   1/1     Running   0          42h


❯ kubectl -n ray get service
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                       AGE
example-cluster-ray-head   ClusterIP   xxx.xxx.xxx.xxx   <none>        10001/TCP,8265/TCP,8000/TCP   4m16s


❯ kubectl get deployment ray-operator
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
ray-operator   1/1     1            1           42h

Dmitri · January 12, 2022, 9:26pm

@lihost Are you still experiencing this issue?

lihost · January 13, 2022, 6:08pm

No longer seeing this issue.

swicaksono · December 20, 2022, 3:19pm

@jiaodong I got this exact same issue on Ray 2.2.0 deployed on my local laptop cluster using minikube. How to fix this? @lihost could you share the way you fix the issue?

Dmitri · December 20, 2022, 4:05pm

Could you provide reproduction steps for the issue, including the local Ray version and the exact commands used?

Topic		Replies	Views
ConnectionError: ray client connection timeout, Ray, Kubernetes	0	507	March 5, 2022
Ray client unable to connect to minikube cluster from outside Kubernetes	3	1023	April 21, 2021
Unable to connect with ray cluster Ray Clusters	0	239	February 13, 2024
Connecting to Ray cluster on Kubernetes from outside the cluster Kubernetes	4	1664	March 23, 2021
Some issues when using the Ray Client	1	897	January 21, 2021

ConnectionError: ray client connection timeout, Ray 1.9.0, Kubernetes

Related topics