I have deployed Ray cluster on kubernetes on local machine and trying to connect to it via another pod(running business logic + models).
I am using ray.init("ray://example-cluster-ray-head:10001", namespace="ray")
and getting below stacktrace
ray.init("ray://example-cluster-ray-head:10001", namespace="ray")
File "/usr/local/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/ray/worker.py", line 775, in init
return builder.connect()
File "/usr/local/lib/python3.9/site-packages/ray/client_builder.py", line 151, in connect
client_info_dict = ray.util.client_connect.connect(
File "/usr/local/lib/python3.9/site-packages/ray/util/client_connect.py", line 33, in connect
conn = ray.connect(
File "/usr/local/lib/python3.9/site-packages/ray/util/client/__init__.py", line 228, in connect
conn = self.get_context().connect(*args, **kw_args)
File "/usr/local/lib/python3.9/site-packages/ray/util/client/__init__.py", line 81, in connect
self.client_worker = Worker(
File "/usr/local/lib/python3.9/site-packages/ray/util/client/worker.py", line 130, in __init__
self._connect_channel()
File "/usr/local/lib/python3.9/site-packages/ray/util/client/worker.py", line 244, in _connect_channel
raise ConnectionError("ray client connection timeout")
ConnectionError: ray client connection timeout
Hi @lihost can you provide a full script for producing this issue ? In addition you might be able to reproduce it on your laptop by manually starting ray cluster ray start --head then calls ray.init("ray://127.0.0.1:10001", namespace="ray") with your script.
For running ray.server and ray.remote, I am following above mentioned guide’s subsection i.e. using-ray-client-to-connect-from-within-the-kubernetes-cluster.
I see, it’s good to know that should be another category of the issue then and from your description it looks like you’re trying out our latest documentation sample script that is not working as expected. Is my understanding correct or you’re using different script for your use case other than ray/job_example.py at master · ray-project/ray · GitHub ?
❯ kubectl -n ray get rayclusters
NAME STATUS RESTARTS AGE
example-cluster Running 0 42h
❯ kubectl -n ray get pods
NAME READY STATUS RESTARTS AGE
example-cluster-ray-head-type-pnkp2 1/1 Running 0 42h
example-cluster-ray-worker-type-5pwgv 1/1 Running 0 42h
example-cluster-ray-worker-type-x2l54 1/1 Running 0 42h
❯ kubectl -n ray get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
example-cluster-ray-head ClusterIP xxx.xxx.xxx.xxx <none> 10001/TCP,8265/TCP,8000/TCP 4m16s
❯ kubectl get deployment ray-operator
NAME READY UP-TO-DATE AVAILABLE AGE
ray-operator 1/1 1 1 42h
@jiaodong I got this exact same issue on Ray 2.2.0 deployed on my local laptop cluster using minikube. How to fix this? @lihost could you share the way you fix the issue?