Is there restrictions or limitations of running ray >=1.10 with python 3.6.9?
for some reason in ray version above 1.8 I’m getting the following error: (with 1.6 it’s work fine)
2022-07-21 08:54:30,177 INFO server.py:842 – Starting Ray Client server on 0.0.0.0:10001
2022-07-21 10:15:17,379 INFO proxier.py:670 – New data connection from client ace70b4b753146babdea12418dbbb528:
2022-07-21 10:15:18,403 INFO proxier.py:341 – SpecificServer started on port: 23000 with PID: 415 for client: ace70b4b753146babdea12418dbbb528
2022-07-21 10:15:48,405 ERROR proxier.py:379 – Timeout waiting for channel for ace70b4b753146babdea12418dbbb528
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/ray/util/client/server/proxier.py”, line 375, in get_channel
timeout=CHECK_CHANNEL_TIMEOUT_S
File “/usr/local/lib/python3.6/dist-packages/grpc/_utilities.py”, line 140, in result
self._block(timeout)
File “/usr/local/lib/python3.6/dist-packages/grpc/_utilities.py”, line 86, in _block
raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError
2022-07-21 10:15:48,405 ERROR proxier.py:379 – Timeout waiting for channel for ace70b4b753146babdea12418dbbb528
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/ray/util/client/server/proxier.py”, line 375, in get_channel
timeout=CHECK_CHANNEL_TIMEOUT_S
File “/usr/local/lib/python3.6/dist-packages/grpc/_utilities.py”, line 140, in result
self._block(timeout)
File “/usr/local/lib/python3.6/dist-packages/grpc/_utilities.py”, line 86, in _block
raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError
2022-07-21 10:15:48,406 ERROR proxier.py:692 – Channel not found for ace70b4b753146babdea12418dbbb528
2022-07-21 10:15:48,406 WARNING proxier.py:777 – Retrying Logstream connection. 1 attempts failed.
Hi @Chen_Shen, Thanks for replying so quickly
output form pip freeze | grep grpc , both head\worker and job is:
grpcio==1.39.0
grpcio-tools==1.39.0
unfortunately, 1.13.0 not reolving my issue.
double checked that all my pods is with ray==1.13.0
Ran into this issue for my application. For context, I’m hosting a Ray Cluster on AWS EC2s in a VPC. The instances are only accessible through a jump host, so I have a user-defined SSH proxy command in my cluster config file. Additionally, the AWS environment traffic all goes through a proxy. EC2 instances are configured with proxy info when they’re launched, and, since I’m using Ray in Docker, the node Docker containers have proxy info configured through environment variables with Docker run options. My test job is just training a PPO agent with RLlib using a dummy environment defined in my script.
Similar to @ray1, I get gRPC timeout and Ray client/server errors when I ray attach $config -p 10001 and use ray.init("ray://localhost:10001") in my test job script, but I’m using Python 3.8 and Ray 2.2. I can ray rsync_up my test job script and run it from the head node and everything works as expected. My pip freeze contents are below (shouldn’t be anything too wild since it’s just the rayproject/ray-ml:latest-py38-cpu Docker image requirements):
My EC2 security groups should already allow in/out traffic over 10001 as well, but I added rules to explicitly allow it and still no luck. Any recommendations @jjyao? I can share my test job script, but can’t share too much else on the cloud environment.
Edit: Also confirmed all nodes and my local environment’s client attempting to connect/run the test job script share the same dependencies as the pip freeze above
For folks tuning in that’re blocked by this - a workaround is using the ray dashboard port forwarding and ray job submit CLIs together as a workaround. This seems to work to be able to submit jobs to the cluster from your local machine. There’s already so many CLIs to follow though, so it’d be nice if we could connect to the remote cluster and run a job by only changing the ray.init() address
Sorry for the late reply. Ray client is no longer the recommended way to run your Ray applications. For development, you can run your script from the head node directly. For production, you can use Ray jobs (i.e. ray job submit). This way you don’t need to worry about mismatched environments between your laptop and the remote cluster.