I am running Ray cluster using operator rayproject/ray:nightly
and nodes with image rayproject/ray:nightly-py37-cpu.
If I use port-forward I can happily connect to the cluster using:
ray.util.connect('127.0.0.1:10001')
But when I am trying to expose the service using either loadbalancer or nodeport, I am getting the following error:
Got Error from data channel – shutting down: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = “Exception iterating responses: "
debug_error_string = “{“created”:”@1617666164.593330000”,“description”:“Error received from peer ipv4:150.239.10.227:10001”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1062,“grpc_message”:"Exception iterating responses: “,“grpc_status”:2}”
Exception in thread Thread-48:
Traceback (most recent call last):
File “/usr/local/Cellar/python@3.7/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py”, line 926, in _bootstrap_inner
self.run()
File “/usr/local/Cellar/python@3.7/3.7.10_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py”, line 870, in run
self._target(*self._args, **self._kwargs)
File “/usr/local/lib/python3.7/site-packages/ray/util/client/dataclient.py”, line 90, in _data_main
raise e
File “/usr/local/lib/python3.7/site-packages/ray/util/client/dataclient.py”, line 64, in _data_main
for response in resp_stream:
File “/usr/local/lib/python3.7/site-packages/grpc/_channel.py”, line 416, in next
return self._next()
File “/usr/local/lib/python3.7/site-packages/grpc/_channel.py”, line 706, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNKNOWN
details = “Exception iterating responses: "
debug_error_string = “{“created”:”@1617666164.593330000”,“description”:“Error received from peer ipv4:150.239.10.227:10001”,“file”:“src/core/lib/surface/call.cc”,“file_line”:1062,“grpc_message”:"Exception iterating responses: “,“grpc_status”:2}”
I think this is similar to the: Exposing dashboard, client and ray serve in kubernetes through NodePort
It looks like the client does not like going through the intermediary