Job submit error

I have a python file run.py, and the code is shown here:

import ray

ray.init(address="192.168.0.169:10001", runtime_env = {"working_dir": "./"})
data = ray.data.read_csv("../data.csv")

When I run the command

python run.py

I got the following error:

2023-04-17 23:31:07,633	INFO worker.py:1364 -- Connecting to existing Ray cluster at address: 92.168.0.169:10001...
2023-04-17 23:31:07,637	ERROR utils.py:1452 -- Internal KV Get failed
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/utils.py", line 1439, in internal_kv_get_with_retry
    result = gcs_client.internal_kv_get(key, namespace)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/gcs_utils.py", line 198, in wrapper
    return f(self, *args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/gcs_utils.py", line 290, in internal_kv_get
    reply = self._kv_stub.InternalKVGet(req, timeout=timeout)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNIMPLEMENTED
	details = "Method not found!"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:192.168.0.169:10001 {grpc_message:"Method not found!", grpc_status:12, created_time:"2023-04-17T23:31:07.636671321-07:00"}"

But if I modify the code as below:

import ray

ray.init(address="auto")
data = ray.data.read_csv("../data.csv")

And run the command:

ray job submit --address=192.168.0.169:6379 --working-dir ./ – python run.py

It successfully finished the job.

I met the above error after I upgraded my ray to version-2.3.0. May someone help me figure out the reason why I got the error please? Thanks a lot!

Was there a typo in the original script in case 1^

Shouldn’t it be 192.168.xxx rather than 92.168.xxx?

I have a python file run.py, and the code is shown here:

import ray

ray.init(address="192.168.0.169:10001", runtime_env = {"working_dir": "./"})
data = ray.data.read_csv("../data.csv")

When I run the command

python run.py

I got the following error:

2023-04-17 23:31:07,633	INFO worker.py:1364 -- Connecting to existing Ray cluster at address: 192.168.0.169:10001...
2023-04-17 23:31:07,637	ERROR utils.py:1452 -- Internal KV Get failed
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/utils.py", line 1439, in internal_kv_get_with_retry
    result = gcs_client.internal_kv_get(key, namespace)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/gcs_utils.py", line 198, in wrapper
    return f(self, *args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/gcs_utils.py", line 290, in internal_kv_get
    reply = self._kv_stub.InternalKVGet(req, timeout=timeout)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNIMPLEMENTED
	details = "Method not found!"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:192.168.0.169:10001 {grpc_message:"Method not found!", grpc_status:12, created_time:"2023-04-17T23:31:07.636671321-07:00"}"

But if I modify the code as below:

import ray

ray.init(address="auto")
data = ray.data.read_csv("../data.csv")

And run the command:

ray job submit --address=192.168.0.169:6379 --working-dir ./ – python run.py

It successfully finished the job.

I met the above error after I upgraded my ray to version-2.3.0. May someone help me figure out the reason why I got the error please? Thanks a lot!

Oh, I lost the【1】 when I pasted it here. I truly set the address to be 192.168.0.169:6379 when I ran the script.

@architkulkarni Is there a different in the job cli / sdk behaviour? This looks surprising to me.

I think there is some weird interaction here with how we process the address. It looks like a GCS address is being provided to --address and a Ray client address was being provided in ray.init(), and it was working before by chance.

The recommended way is:

  • In the job API, use http://192.168.0.169:8265 as the address. This is the address of the Ray API server, and please make sure http:// is included.
  • Also, in the entrypoint script, don’t specify any address in ray.init().

We should improve the error handling here. @veryhannibal can you see if this works?

1 Like

Thanks for your help. I think it is my mistake. I modified the code as blow, and It works now.

ray.init(address="ray://192.168.0.169:10001", runtime_env = {"working_dir": "./"})

:smile: :smile: :smile: