Ray and python versions

Hi Ray Team,

Is there restrictions or limitations of running ray >=1.10 with python 3.6.9?
for some reason in ray version above 1.8 I’m getting the following error: (with 1.6 it’s work fine)

2022-07-21 08:54:30,177 INFO server.py:842 – Starting Ray Client server on 0.0.0.0:10001
2022-07-21 10:15:17,379 INFO proxier.py:670 – New data connection from client ace70b4b753146babdea12418dbbb528:
2022-07-21 10:15:18,403 INFO proxier.py:341 – SpecificServer started on port: 23000 with PID: 415 for client: ace70b4b753146babdea12418dbbb528
2022-07-21 10:15:48,405 ERROR proxier.py:379 – Timeout waiting for channel for ace70b4b753146babdea12418dbbb528
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/ray/util/client/server/proxier.py”, line 375, in get_channel
timeout=CHECK_CHANNEL_TIMEOUT_S
File “/usr/local/lib/python3.6/dist-packages/grpc/_utilities.py”, line 140, in result
self._block(timeout)
File “/usr/local/lib/python3.6/dist-packages/grpc/_utilities.py”, line 86, in _block
raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError
2022-07-21 10:15:48,405 ERROR proxier.py:379 – Timeout waiting for channel for ace70b4b753146babdea12418dbbb528
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/ray/util/client/server/proxier.py”, line 375, in get_channel
timeout=CHECK_CHANNEL_TIMEOUT_S
File “/usr/local/lib/python3.6/dist-packages/grpc/_utilities.py”, line 140, in result
self._block(timeout)
File “/usr/local/lib/python3.6/dist-packages/grpc/_utilities.py”, line 86, in _block
raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError
2022-07-21 10:15:48,406 ERROR proxier.py:692 – Channel not found for ace70b4b753146babdea12418dbbb528
2022-07-21 10:15:48,406 WARNING proxier.py:777 – Retrying Logstream connection. 1 attempts failed.

python version 3.6.9
ray version > 1.10

Thanks

hi @ray1, welcome to the community!

I suspect this is to do with the python grpc library version you are using. would you mind sharing your pip freeze result?

On another note, can you try ray==1.13.0 to see if this problem still exists?

Hi @Chen_Shen, Thanks for replying so quickly
output form pip freeze | grep grpc , both head\worker and job is:
grpcio==1.39.0
grpcio-tools==1.39.0
unfortunately, 1.13.0 not reolving my issue.
double checked that all my pods is with ray==1.13.0

I’m in k8s deployment

ah, I realized it’s a ray client/server connection issue. Just checking you are using the same python/ray version on ray client/server?

yes both pods (client\server) are with the same ray & python version. I’m using my one docker and with ray-1.13.0-py36 wheel install on it

@ray1

Do you have a repro script that reproduce this error?

Can you also share the rest of the contents of pip freeze if possible

Ran into this issue for my application. For context, I’m hosting a Ray Cluster on AWS EC2s in a VPC. The instances are only accessible through a jump host, so I have a user-defined SSH proxy command in my cluster config file. Additionally, the AWS environment traffic all goes through a proxy. EC2 instances are configured with proxy info when they’re launched, and, since I’m using Ray in Docker, the node Docker containers have proxy info configured through environment variables with Docker run options. My test job is just training a PPO agent with RLlib using a dummy environment defined in my script.

Similar to @ray1, I get gRPC timeout and Ray client/server errors when I ray attach $config -p 10001 and use ray.init("ray://localhost:10001") in my test job script, but I’m using Python 3.8 and Ray 2.2. I can ray rsync_up my test job script and run it from the head node and everything works as expected. My pip freeze contents are below (shouldn’t be anything too wild since it’s just the rayproject/ray-ml:latest-py38-cpu Docker image requirements):

absl-py==1.3.0
accelerate==0.5.1
adal==1.2.7
aiohttp==3.8.3
aiohttp-cors==0.7.0
aiorwlock==1.3.0
aiosignal==1.3.1
ale-py==0.7.5
alembic==1.4.1
anyio==3.6.2
applicationinsights==0.11.10
argcomplete==1.12.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.2.1
astunparse==1.6.3
async-timeout==4.0.2
attrs==22.1.0
autocfg==0.0.8
autogluon.common==0.6.0
autogluon.core==0.6.0
autograd==1.5
autopage==0.5.1
AutoROM==0.4.2
AutoROM.accept-rom-license==0.4.2
ax-platform==0.2.4
azure-cli-core==2.40.0
azure-cli-telemetry==1.0.8
azure-common==1.1.28
azure-core==1.26.1
azure-identity==1.10.0
azure-mgmt-compute==23.1.0
azure-mgmt-core==1.3.2
azure-mgmt-network==19.0.0
azure-mgmt-resource==20.0.0
backcall==0.2.0
backoff==1.10.0
bayesian-optimization==1.2.0
bcrypt==4.0.1
beautifulsoup4==4.11.1
bleach==5.0.1
blessed==1.19.1
boto3==1.4.8
botocore==1.8.50
botorch==0.6.2
brotlipy==0.7.0
cachetools==5.2.0
catboost==1.1.1
certifi @ file:///croot/certifi_1665076670883/work/certifi
cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
chex==0.1.5
click==8.1.3
cliff==4.1.0
cloudpickle==2.2.0
cma==3.2.2
cmaes==0.9.0
cmd2==2.4.2
colorama @ file:///tmp/build/80754af9/colorama_1607707115595/work
coloredlogs==15.0.1
colorful==0.5.5
colorlog==6.7.0
comet-ml==3.31.9
comm==0.1.1
commonmark==0.9.1
conda==22.11.1
conda-content-trust @ file:///tmp/build/80754af9/conda-content-trust_1617045594566/work
conda-package-handling @ file:///croot/conda-package-handling_1666940373510/work
configobj==5.0.6
ConfigSpace==0.4.18
contourpy==1.0.6
coolname==1.1.0
cryptography @ file:///croot/cryptography_1665612644927/work
cycler==0.11.0
Cython==0.29.26
dask==2021.11.2
databricks-cli==0.17.3
datasets==2.0.0
debugpy==1.6.4
decorator==5.1.1
decord==0.6.0
defusedxml==0.7.1
dill==0.3.6
distlib==0.3.6
distributed==2021.11.2
dm-tree==0.1.7
docker==6.0.1
docker-pycreds==0.4.0
docstring-parser==0.15
docutils==0.19
dopamine-rl==4.0.6
dragonfly-opt==0.1.6
dulwich==0.20.50
entrypoints==0.4
everett==3.1.0
exceptiongroup==1.0.4
executing==1.2.0
fastapi==0.88.0
fastjsonschema==2.16.2
filelock==3.8.2
FLAML==0.9.7
Flask==2.2.2
flatbuffers==1.12
flax==0.6.2
fonttools==4.38.0
freezegun==1.1.0
frozenlist==1.3.3
fsspec==2022.11.0
future==0.18.2
gast==0.4.0
gin-config==0.5.0
gitdb==4.0.10
GitPython==3.1.29
gluoncv==0.10.1.post0
google-api-core==2.11.0
google-api-python-client==1.7.8
google-auth==2.15.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-oauth==1.0.1
google-pasta==0.2.0
googleapis-common-protos==1.57.0
gpustat==1.0.0
GPy==1.10.0
gpytorch==1.9.0
graphviz==0.8.4
greenlet==2.0.1
grpcio==1.51.1
gunicorn==20.1.0
gym==0.24.0
gym-notices==0.0.8
h11==0.14.0
h5py==3.7.0
HeapDict==1.0.1
HEBO==0.3.2
higher==0.2.1
hpbandster==0.7.4
httplib2==0.21.0
huggingface-hub==0.11.1
humanfriendly==10.0
hyperopt==0.2.5
idna @ file:///tmp/build/80754af9/idna_1637925883363/work
imageio==2.22.4
imageio-ffmpeg==0.4.5
importlib-metadata==5.1.0
importlib-resources==5.10.1
iniconfig==1.1.1
ipykernel==6.19.0
ipython==8.7.0
ipython-genutils==0.2.0
ipywidgets==8.0.3
isodate==0.6.1
itsdangerous==2.1.2
jax==0.3.25
jaxlib==0.3.25
jedi==0.18.2
Jinja2==3.1.2
jmespath==0.10.0
joblib==1.2.0
jsonschema==4.17.3
jupyter==1.0.0
jupyter-console==6.4.4
jupyter-events==0.5.0
jupyter_client==7.4.8
jupyter_core==5.1.0
jupyter_server==2.0.0
jupyter_server_terminals==0.4.2
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.4
kaggle-environments==1.7.11
keras==2.9.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.4
knack==0.10.1
kubernetes==25.3.0
libclang==14.0.6
lightgbm==3.2.1
lightgbm-ray==0.1.5
lightning-bolts==0.4.0
linear-operator==0.3.0
locket==1.0.0
lz4==4.0.2
Mako==1.2.4
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.6.2
matplotlib-inline==0.1.6
mistune==2.0.4
mlagents-envs==0.28.0
mlflow==1.21.0
modin==0.12.1
mosaicml==0.10.1
mpmath==1.2.1
msal==1.18.0b1
msal-extensions==1.0.0
msgpack==1.0.4
msrest==0.7.1
msrestazure==0.6.4
multidict==6.0.3
multipledispatch==0.6.0
multiprocess==0.70.14
mxnet==1.8.0.post0
nbclassic==0.4.8
nbclient==0.7.2
nbconvert==7.2.6
nbformat==5.7.0
nest-asyncio==1.5.6
netifaces==0.11.0
networkx==2.8.8
nevergrad==0.4.3.post7
notebook==6.5.2
notebook_shim==0.2.2
numpy==1.23.5
nvidia-ml-py==11.495.46
oauthlib==3.2.2
onnx==1.12.0
onnxruntime==1.12.0
open-spiel==1.2
opencensus==0.11.0
opencensus-context==0.1.3
opencv-python==3.4.18.65
opentelemetry-api==1.1.0
opentelemetry-exporter-otlp==1.1.0
opentelemetry-exporter-otlp-proto-grpc==1.1.0
opentelemetry-proto==1.1.0
opentelemetry-sdk==1.1.0
opentelemetry-semantic-conventions==0.20b0
opt-einsum==3.3.0
optax==0.1.4
optuna==2.10.0
packaging==21.3
pandas==1.5.2
pandocfilters==1.5.0
paramiko==2.12.0
paramz==0.9.5
parso==0.8.3
partd==1.3.0
pathtools==0.1.2
patsy==0.5.3
pbr==5.11.0
PettingZoo==1.15.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.3.0
pkginfo==1.9.2
pkgutil_resolve_name==1.3.10
platformdirs==2.6.0
plotly==5.11.0
pluggy @ file:///tmp/build/80754af9/pluggy_1648042571233/work
portalocker==2.6.0
prettytable==3.5.0
prometheus-client==0.13.1
prometheus-flask-exporter==0.21.0
promise==2.3
prompt-toolkit==3.0.36
protobuf==3.20.3
psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
py-cpuinfo==8.0.0
py-spy==0.3.14
pyaml==21.10.1
pyarrow==10.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybullet==3.2.0
pycosat @ file:///croot/pycosat_1666805502580/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydantic==1.10.2
pyDeprecate==0.3.1
pygame==2.1.2
pyglet==1.5.15
Pygments==2.13.0
PyJWT==2.6.0
pymoo==0.5.0
pymunk==6.0.0
PyNaCl==1.5.0
pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
pyparsing==3.0.9
pyperclip==1.8.2
pypng==0.20220715.0
Pyro4==4.82
pyrsistent==0.19.2
PySocks @ file:///tmp/build/80754af9/pysocks_1605305779399/work
pytest==7.2.0
pytest-remotedata==0.3.2
python-dateutil==2.8.2
python-editor==1.0.4
python-json-logger==2.0.4
pytorch-lightning==1.5.10
pytorch-ranger==0.1.1
pytz==2022.6
PyWavelets==1.4.1
PyYAML==6.0
pyzmq==24.0.1
qtconsole==5.4.0
QtPy==2.3.0
querystring-parser==1.2.4
ray @ file:///home/ray/ray-2.2.0-cp38-cp38-manylinux2014_x86_64.whl
ray-cpp==2.2.0
ray-lightning==0.2.0
recsim==0.2.4
redis==3.5.3
regex==2022.10.31
requests==2.28.1
requests-oauthlib==1.3.1
requests-toolbelt==0.10.1
responses==0.18.0
rich==12.6.0
rsa==4.9
ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work
ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work
s3transfer==0.1.13
scikit-image==0.19.3
scikit-learn==1.1.3
scikit-optimize==0.9.0
scipy==1.9.3
semantic-version==2.10.0
Send2Trash==1.8.0
sentencepiece==0.1.96
sentry-sdk==1.11.1
serpent==1.41
setproctitle==1.3.2
shortuuid==1.0.1
sigopt==7.5.0
six==1.13.0
smart-open==6.2.0
smmap==5.0.0
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.3.2.post1
SQLAlchemy==1.4.44
sqlparse==0.4.3
stack-data==0.6.2
starlette==0.22.0
statsmodels==0.13.5
stevedore==4.1.1
SuperSuit==3.3.3
sympy==1.11.1
tabulate==0.9.0
tblib==1.7.0
tenacity==8.1.0
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorboardX==2.5.1
tensorflow==2.9.0
tensorflow-estimator==2.9.0
tensorflow-io-gcs-filesystem==0.28.0
tensorflow-probability==0.17.0
tensorstore==0.1.28
termcolor==2.1.1
terminado==0.17.1
tf-slim==1.1.0
tf2onnx==1.12.1
threadpoolctl==3.1.0
tifffile==2022.10.10
timm==0.4.5
tinycss2==1.2.1
tokenizers==0.12.1
tomli==2.0.1
toolz @ file:///croot/toolz_1667464077321/work
torch==1.12.1+cu116
torch-geometric==2.0.4
torch-optimizer==0.3.0
torch-scatter==2.0.9
torch-sparse==0.6.15+pt112cu116
torchmetrics==0.7.3
torchvision==0.13.1+cu116
tornado==6.2
tqdm @ file:///opt/conda/conda-bld/tqdm_1647339053476/work
traitlets==5.6.0
transformers==4.19.1
tune-sklearn==0.4.4
typeguard==2.13.3
typer==0.7.0
typing_extensions==4.4.0
uritemplate==3.0.1
urllib3==1.26.13
uvicorn==0.20.0
virtualenv==20.17.1
wandb==0.13.4
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.4.2
Werkzeug==2.2.2
widgetsnbextension==4.0.4
wrapt==1.14.1
wurlitzer==3.0.3
xgboost==1.3.3
xgboost-ray==0.1.10
xxhash==3.1.0
yacs==0.1.8
yahp==0.1.3
yarl==1.8.2
zict==2.2.0
zipp==3.11.0
zoopt==0.4.1

My EC2 security groups should already allow in/out traffic over 10001 as well, but I added rules to explicitly allow it and still no luck. Any recommendations @jjyao? I can share my test job script, but can’t share too much else on the cloud environment.

Edit: Also confirmed all nodes and my local environment’s client attempting to connect/run the test job script share the same dependencies as the pip freeze above

For folks tuning in that’re blocked by this - a workaround is using the ray dashboard port forwarding and ray job submit CLIs together as a workaround. This seems to work to be able to submit jobs to the cluster from your local machine. There’s already so many CLIs to follow though, so it’d be nice if we could connect to the remote cluster and run a job by only changing the ray.init() address

Hi @ognf,

Sorry for the late reply. Ray client is no longer the recommended way to run your Ray applications. For development, you can run your script from the head node directly. For production, you can use Ray jobs (i.e. ray job submit). This way you don’t need to worry about mismatched environments between your laptop and the remote cluster.

1 Like