- Low: It annoys or frustrates me for a moment.
Greetings! I’m still testing the stability of the Ray cluster that I set up on a few computers connected in a local network. I start the cluster by running ray start and I monitor the cluster from both the dashboard and ray status on the head node.
An issue I ran into is that the ray status command worked initially after the Ray cluster was started but then it started to complain Ray cluster is not found at 192.168.1.100:6379 due to deadline exceeded errors. Interestingly, the Ray cluster was actually still running: the existing and new jobs still finished and the dashboard was able to update according to the workload. It would be nice to have ray status working stably as it shows the number of pending tasks, which is not available on the dashboard.
Error message:
Traceback (most recent call last):
File "/home/xyz/.local/lib/python3.10/site-packages/ray/_private/gcs_utils.py", line 120, in check_health
resp = stub.CheckAlive(req, timeout=timeout)
File "/home/xyz/.local/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/xyz/.local/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1666548607.243184041","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"
>
Ray cluster is not found at 192.168.1.100:6379
I’m using Ray 2.0.0.