- Low: It annoys or frustrates me for a moment.
Greetings! I’m still testing the stability of the Ray cluster that I set up on a few computers connected in a local network. I start the cluster by running ray start
and I monitor the cluster from both the dashboard and ray status
on the head node.
An issue I ran into is that the ray status
command worked initially after the Ray cluster was started but then it started to complain Ray cluster is not found at 192.168.1.100:6379
due to deadline exceeded errors. Interestingly, the Ray cluster was actually still running: the existing and new jobs still finished and the dashboard was able to update according to the workload. It would be nice to have ray status
working stably as it shows the number of pending tasks, which is not available on the dashboard.
Error message:
Traceback (most recent call last):
File "/home/xyz/.local/lib/python3.10/site-packages/ray/_private/gcs_utils.py", line 120, in check_health
resp = stub.CheckAlive(req, timeout=timeout)
File "/home/xyz/.local/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/xyz/.local/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1666548607.243184041","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"
>
Ray cluster is not found at 192.168.1.100:6379
I’m using Ray 2.0.0.