When submitting the jobs without a runtime conda env, it’s successful, but when adding conda, I get the error:
RuntimeError: Version mismatch: The cluster was started with:
Ray: 2.8.0
Python: 3.10.8
This process on node 10.42.0.23 was started with:
Ray: 2.8.0
Python: 3.10.13
The cluster is setup with the image rayproject/ray:2.8.0-py310
, while exploring the image on DockerHub , the image layer specifies the following line:
ARG PYTHON_VERSION=3.8.16
Now, I have the raycluster.yaml
file which contains a basic setup of a cluster, linked to custom image built locally to include predefined conda environments where rayproject/ray:2.8.0-py310
is a base image (attached more details).
We pass these values to kuberay/raycluster
and spin it up. In the dashboard, cluster info, the python version is 3.10.8
raycluster.yaml
:
Now, when inspecting the pods head and workers the following details where obtained:
> python --version
Python 3.10.13
> conda info
python version : 3.10.8.final.0
and when submitting a job to that cluster, via the following commands from the terminal:
ray job submit --address http://localhost:8265 --runtime-env-json='{"working_dir": ".", "conda": "base_env"}' -- python3.10 rayscripts/workflow.py
# or
ray job submit --address http://localhost:8265 --runtime-env-json='{"working_dir": ".", "conda": "base_env"}' -- python rayscripts/workflow.py
# where
> python3.10 --version
> Python 3.10.8
# and
> python --version
> Python 3.10.12
The question here, why is ray complaining about the python version in the presence of a conda env, how do we determine the exact python version of the cluster, and where did the python version 3.10.13
come from where the terminal’s version is 3.10.12
also for the 3.10.8
…