Hi team,
when I tried to connect to an existing cluster without a runtime_env
parameter, it works fine.
but when I tried to pass a runtime_env
to the cluster, it will hang forever and raise the error:
ConnectionAbortedError: Initialization failure from server:
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 704, in Datapath
if not self.proxy_manager.start_specific_server(
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 305, in start_specific_server
serialized_runtime_env_context = self._create_runtime_env(
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 281, in _create_runtime_env
raise TimeoutError(
TimeoutError: GetOrCreateRuntimeEnv request failed after 5 attempts. Last exception: HTTP Error 503: Service Unavailable
the code I’m using is just:
In [1]: import ray
In [2]: ray.init(address="ray://localhost:10001", runtime_env={"pip": ["emoji"]})
the head log is like:
12024-04-15 02:38:37,429 INFO server.py:885 -- Starting Ray Client server on 0.0.0.0:10001, args Namespace(address='240.52.20.253:6379', host='0.0.0.0', mode='proxy', port=10001, redis_password=None, runtime_env_agent_address='http://240.52.20.253:50805')
22024-04-15 02:40:49,943 INFO proxier.py:696 -- New data connection from client a56378b6ab814ecbb066279448514367:
32024-04-15 02:40:49,952 INFO proxier.py:223 -- Increasing runtime env reference for ray_client_server_23000.Serialized runtime env is {"_ray_commit": "9be5a16e3ccad0710bba08d0f75e9ff774ae6880", "pip": {"packages": ["emoji"], "pip_check": false}}.
42024-04-15 02:41:49,001 WARNING proxier.py:270 -- GetOrCreateRuntimeEnv request failed: HTTP Error 503: Service Unavailable. Retrying after 0.5s. 5 retries remaining.
52024-04-15 02:42:49,001 WARNING proxier.py:270 -- GetOrCreateRuntimeEnv request failed: HTTP Error 503: Service Unavailable. Retrying after 1.0s. 4 retries remaining.
62024-04-15 02:43:50,001 WARNING proxier.py:270 -- GetOrCreateRuntimeEnv request failed: HTTP Error 503: Service Unavailable. Retrying after 2.0s. 3 retries remaining.
72024-04-15 02:44:52,001 WARNING proxier.py:270 -- GetOrCreateRuntimeEnv request failed: HTTP Error 503: Service Unavailable. Retrying after 4.0s. 2 retries remaining.
82024-04-15 02:45:56,002 WARNING proxier.py:270 -- GetOrCreateRuntimeEnv request failed: HTTP Error 503: Service Unavailable. Retrying after 8.0s. 1 retries remaining.
92024-04-15 02:47:04,001 WARNING proxier.py:270 -- GetOrCreateRuntimeEnv request failed: HTTP Error 503: Service Unavailable. Retrying after 16.0s. 0 retries remaining.
102024-04-15 02:47:50,016 INFO proxier.py:768 -- a56378b6ab814ecbb066279448514367 last started stream at 1713174049.9413047. Current stream started at 1713174049.9413047.
I digged a little bit in the source code and found it may because I’m using a proxy to connect to the internet, but I’ve already added the IP CIDR 240.52.0.0/16
to my environment variable: no_proxy
and NO_PROXY
, I’m not sure but it seems not working.
and I also tried the same code but through Job API, and it works:
ray job submit --address="http://localhost:8265" --runtime-env-json='{"pip": ["emoji"]}' -- python test_ray_job.py
Job submission server address: http://localhost:8265
-------------------------------------------------------
Job 'raysubmit_gZuVtxj2Uc8H4q1G' submitted successfully
-------------------------------------------------------
Next steps
Query the logs of the job:
ray job logs raysubmit_gZuVtxj2Uc8H4q1G
Query the status of the job:
ray job status raysubmit_gZuVtxj2Uc8H4q1G
Request the job to be stopped:
ray job stop raysubmit_gZuVtxj2Uc8H4q1G
can you help to check what should I do to make it work?
thank you.