Ray and python versions

Ran into this issue for my application. For context, I’m hosting a Ray Cluster on AWS EC2s in a VPC. The instances are only accessible through a jump host, so I have a user-defined SSH proxy command in my cluster config file. Additionally, the AWS environment traffic all goes through a proxy. EC2 instances are configured with proxy info when they’re launched, and, since I’m using Ray in Docker, the node Docker containers have proxy info configured through environment variables with Docker run options. My test job is just training a PPO agent with RLlib using a dummy environment defined in my script.

Similar to @ray1, I get gRPC timeout and Ray client/server errors when I ray attach $config -p 10001 and use ray.init("ray://localhost:10001") in my test job script, but I’m using Python 3.8 and Ray 2.2. I can ray rsync_up my test job script and run it from the head node and everything works as expected. My pip freeze contents are below (shouldn’t be anything too wild since it’s just the rayproject/ray-ml:latest-py38-cpu Docker image requirements):

absl-py==1.3.0
accelerate==0.5.1
adal==1.2.7
aiohttp==3.8.3
aiohttp-cors==0.7.0
aiorwlock==1.3.0
aiosignal==1.3.1
ale-py==0.7.5
alembic==1.4.1
anyio==3.6.2
applicationinsights==0.11.10
argcomplete==1.12.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.2.1
astunparse==1.6.3
async-timeout==4.0.2
attrs==22.1.0
autocfg==0.0.8
autogluon.common==0.6.0
autogluon.core==0.6.0
autograd==1.5
autopage==0.5.1
AutoROM==0.4.2
AutoROM.accept-rom-license==0.4.2
ax-platform==0.2.4
azure-cli-core==2.40.0
azure-cli-telemetry==1.0.8
azure-common==1.1.28
azure-core==1.26.1
azure-identity==1.10.0
azure-mgmt-compute==23.1.0
azure-mgmt-core==1.3.2
azure-mgmt-network==19.0.0
azure-mgmt-resource==20.0.0
backcall==0.2.0
backoff==1.10.0
bayesian-optimization==1.2.0
bcrypt==4.0.1
beautifulsoup4==4.11.1
bleach==5.0.1
blessed==1.19.1
boto3==1.4.8
botocore==1.8.50
botorch==0.6.2
brotlipy==0.7.0
cachetools==5.2.0
catboost==1.1.1
certifi @ file:///croot/certifi_1665076670883/work/certifi
cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
chex==0.1.5
click==8.1.3
cliff==4.1.0
cloudpickle==2.2.0
cma==3.2.2
cmaes==0.9.0
cmd2==2.4.2
colorama @ file:///tmp/build/80754af9/colorama_1607707115595/work
coloredlogs==15.0.1
colorful==0.5.5
colorlog==6.7.0
comet-ml==3.31.9
comm==0.1.1
commonmark==0.9.1
conda==22.11.1
conda-content-trust @ file:///tmp/build/80754af9/conda-content-trust_1617045594566/work
conda-package-handling @ file:///croot/conda-package-handling_1666940373510/work
configobj==5.0.6
ConfigSpace==0.4.18
contourpy==1.0.6
coolname==1.1.0
cryptography @ file:///croot/cryptography_1665612644927/work
cycler==0.11.0
Cython==0.29.26
dask==2021.11.2
databricks-cli==0.17.3
datasets==2.0.0
debugpy==1.6.4
decorator==5.1.1
decord==0.6.0
defusedxml==0.7.1
dill==0.3.6
distlib==0.3.6
distributed==2021.11.2
dm-tree==0.1.7
docker==6.0.1
docker-pycreds==0.4.0
docstring-parser==0.15
docutils==0.19
dopamine-rl==4.0.6
dragonfly-opt==0.1.6
dulwich==0.20.50
entrypoints==0.4
everett==3.1.0
exceptiongroup==1.0.4
executing==1.2.0
fastapi==0.88.0
fastjsonschema==2.16.2
filelock==3.8.2
FLAML==0.9.7
Flask==2.2.2
flatbuffers==1.12
flax==0.6.2
fonttools==4.38.0
freezegun==1.1.0
frozenlist==1.3.3
fsspec==2022.11.0
future==0.18.2
gast==0.4.0
gin-config==0.5.0
gitdb==4.0.10
GitPython==3.1.29
gluoncv==0.10.1.post0
google-api-core==2.11.0
google-api-python-client==1.7.8
google-auth==2.15.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-oauth==1.0.1
google-pasta==0.2.0
googleapis-common-protos==1.57.0
gpustat==1.0.0
GPy==1.10.0
gpytorch==1.9.0
graphviz==0.8.4
greenlet==2.0.1
grpcio==1.51.1
gunicorn==20.1.0
gym==0.24.0
gym-notices==0.0.8
h11==0.14.0
h5py==3.7.0
HeapDict==1.0.1
HEBO==0.3.2
higher==0.2.1
hpbandster==0.7.4
httplib2==0.21.0
huggingface-hub==0.11.1
humanfriendly==10.0
hyperopt==0.2.5
idna @ file:///tmp/build/80754af9/idna_1637925883363/work
imageio==2.22.4
imageio-ffmpeg==0.4.5
importlib-metadata==5.1.0
importlib-resources==5.10.1
iniconfig==1.1.1
ipykernel==6.19.0
ipython==8.7.0
ipython-genutils==0.2.0
ipywidgets==8.0.3
isodate==0.6.1
itsdangerous==2.1.2
jax==0.3.25
jaxlib==0.3.25
jedi==0.18.2
Jinja2==3.1.2
jmespath==0.10.0
joblib==1.2.0
jsonschema==4.17.3
jupyter==1.0.0
jupyter-console==6.4.4
jupyter-events==0.5.0
jupyter_client==7.4.8
jupyter_core==5.1.0
jupyter_server==2.0.0
jupyter_server_terminals==0.4.2
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.4
kaggle-environments==1.7.11
keras==2.9.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.4
knack==0.10.1
kubernetes==25.3.0
libclang==14.0.6
lightgbm==3.2.1
lightgbm-ray==0.1.5
lightning-bolts==0.4.0
linear-operator==0.3.0
locket==1.0.0
lz4==4.0.2
Mako==1.2.4
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.6.2
matplotlib-inline==0.1.6
mistune==2.0.4
mlagents-envs==0.28.0
mlflow==1.21.0
modin==0.12.1
mosaicml==0.10.1
mpmath==1.2.1
msal==1.18.0b1
msal-extensions==1.0.0
msgpack==1.0.4
msrest==0.7.1
msrestazure==0.6.4
multidict==6.0.3
multipledispatch==0.6.0
multiprocess==0.70.14
mxnet==1.8.0.post0
nbclassic==0.4.8
nbclient==0.7.2
nbconvert==7.2.6
nbformat==5.7.0
nest-asyncio==1.5.6
netifaces==0.11.0
networkx==2.8.8
nevergrad==0.4.3.post7
notebook==6.5.2
notebook_shim==0.2.2
numpy==1.23.5
nvidia-ml-py==11.495.46
oauthlib==3.2.2
onnx==1.12.0
onnxruntime==1.12.0
open-spiel==1.2
opencensus==0.11.0
opencensus-context==0.1.3
opencv-python==3.4.18.65
opentelemetry-api==1.1.0
opentelemetry-exporter-otlp==1.1.0
opentelemetry-exporter-otlp-proto-grpc==1.1.0
opentelemetry-proto==1.1.0
opentelemetry-sdk==1.1.0
opentelemetry-semantic-conventions==0.20b0
opt-einsum==3.3.0
optax==0.1.4
optuna==2.10.0
packaging==21.3
pandas==1.5.2
pandocfilters==1.5.0
paramiko==2.12.0
paramz==0.9.5
parso==0.8.3
partd==1.3.0
pathtools==0.1.2
patsy==0.5.3
pbr==5.11.0
PettingZoo==1.15.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.3.0
pkginfo==1.9.2
pkgutil_resolve_name==1.3.10
platformdirs==2.6.0
plotly==5.11.0
pluggy @ file:///tmp/build/80754af9/pluggy_1648042571233/work
portalocker==2.6.0
prettytable==3.5.0
prometheus-client==0.13.1
prometheus-flask-exporter==0.21.0
promise==2.3
prompt-toolkit==3.0.36
protobuf==3.20.3
psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
py-cpuinfo==8.0.0
py-spy==0.3.14
pyaml==21.10.1
pyarrow==10.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybullet==3.2.0
pycosat @ file:///croot/pycosat_1666805502580/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydantic==1.10.2
pyDeprecate==0.3.1
pygame==2.1.2
pyglet==1.5.15
Pygments==2.13.0
PyJWT==2.6.0
pymoo==0.5.0
pymunk==6.0.0
PyNaCl==1.5.0
pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
pyparsing==3.0.9
pyperclip==1.8.2
pypng==0.20220715.0
Pyro4==4.82
pyrsistent==0.19.2
PySocks @ file:///tmp/build/80754af9/pysocks_1605305779399/work
pytest==7.2.0
pytest-remotedata==0.3.2
python-dateutil==2.8.2
python-editor==1.0.4
python-json-logger==2.0.4
pytorch-lightning==1.5.10
pytorch-ranger==0.1.1
pytz==2022.6
PyWavelets==1.4.1
PyYAML==6.0
pyzmq==24.0.1
qtconsole==5.4.0
QtPy==2.3.0
querystring-parser==1.2.4
ray @ file:///home/ray/ray-2.2.0-cp38-cp38-manylinux2014_x86_64.whl
ray-cpp==2.2.0
ray-lightning==0.2.0
recsim==0.2.4
redis==3.5.3
regex==2022.10.31
requests==2.28.1
requests-oauthlib==1.3.1
requests-toolbelt==0.10.1
responses==0.18.0
rich==12.6.0
rsa==4.9
ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work
ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work
s3transfer==0.1.13
scikit-image==0.19.3
scikit-learn==1.1.3
scikit-optimize==0.9.0
scipy==1.9.3
semantic-version==2.10.0
Send2Trash==1.8.0
sentencepiece==0.1.96
sentry-sdk==1.11.1
serpent==1.41
setproctitle==1.3.2
shortuuid==1.0.1
sigopt==7.5.0
six==1.13.0
smart-open==6.2.0
smmap==5.0.0
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.3.2.post1
SQLAlchemy==1.4.44
sqlparse==0.4.3
stack-data==0.6.2
starlette==0.22.0
statsmodels==0.13.5
stevedore==4.1.1
SuperSuit==3.3.3
sympy==1.11.1
tabulate==0.9.0
tblib==1.7.0
tenacity==8.1.0
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorboardX==2.5.1
tensorflow==2.9.0
tensorflow-estimator==2.9.0
tensorflow-io-gcs-filesystem==0.28.0
tensorflow-probability==0.17.0
tensorstore==0.1.28
termcolor==2.1.1
terminado==0.17.1
tf-slim==1.1.0
tf2onnx==1.12.1
threadpoolctl==3.1.0
tifffile==2022.10.10
timm==0.4.5
tinycss2==1.2.1
tokenizers==0.12.1
tomli==2.0.1
toolz @ file:///croot/toolz_1667464077321/work
torch==1.12.1+cu116
torch-geometric==2.0.4
torch-optimizer==0.3.0
torch-scatter==2.0.9
torch-sparse==0.6.15+pt112cu116
torchmetrics==0.7.3
torchvision==0.13.1+cu116
tornado==6.2
tqdm @ file:///opt/conda/conda-bld/tqdm_1647339053476/work
traitlets==5.6.0
transformers==4.19.1
tune-sklearn==0.4.4
typeguard==2.13.3
typer==0.7.0
typing_extensions==4.4.0
uritemplate==3.0.1
urllib3==1.26.13
uvicorn==0.20.0
virtualenv==20.17.1
wandb==0.13.4
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.4.2
Werkzeug==2.2.2
widgetsnbextension==4.0.4
wrapt==1.14.1
wurlitzer==3.0.3
xgboost==1.3.3
xgboost-ray==0.1.10
xxhash==3.1.0
yacs==0.1.8
yahp==0.1.3
yarl==1.8.2
zict==2.2.0
zipp==3.11.0
zoopt==0.4.1

My EC2 security groups should already allow in/out traffic over 10001 as well, but I added rules to explicitly allow it and still no luck. Any recommendations @jjyao? I can share my test job script, but can’t share too much else on the cloud environment.

Edit: Also confirmed all nodes and my local environment’s client attempting to connect/run the test job script share the same dependencies as the pip freeze above