How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
(I am using ray 2.2.0)
Hi. I am quite new to ray and am setting up a first ray entrypoint. I have created a docker image with a conda environement installed in it that I want to activate when I submit my jobs through the JobSubmissionClient. When I do this my job gets stuck on PENDING and never starts. I think this is because of mismatching python/ray version between the cluster and my local machine, because the documentation states:
" conda environments must have the same Python version as the Ray cluster. Do not list ray in the conda dependencies, as it will be automatically installed."
(Environment Dependencies — Ray 2.2.0)
I am launching the job from the same environment I want to run on the cluster but this do not work. When the documentation here states the “cluster” do they refer to what is installed outside of conda, in the conda base environement or the environment I want to run?
Hi @AxelN, that comment only applies if you’re using runtime_env with the "conda" field. Are you using runtime environments? If so, can you see if there are any details in dashboard_agent.log or runtime_env_setup_*.log about the runtime env setup? By default these logs are located at /tmp/ray/session_latest/logs.
If you don’t need to specify different environments at runtime, you don’t need to use runtime_env. If the Ray cluster was started (i.e. ray start was called) inside the desired conda env, then any jobs submitted will automatically use that conda env. So perhaps you could set up your docker image entrypoint to activate the conda env before calling ray start.
Thanks. Yes I am using a runtime_env because I have to use conda (some packages I am using are not available in pip) . But using ray start instead could then be an option so will try that, thanks for the help!
Thanks for that! In my situation, I’d like to use the runtime_env just to upload my code to the worker but I also want the conda env already in the docker image to be used (not one that I specify in the runtime_env). Something like this:
job_id = client.submit_job(
# Entrypoint command to execute
entrypoint=full_command,
entrypoint_num_cpus=1,
entrypoint_num_gpus=num_gpus,
# Ensures the AI code run is always the latest one synchronized with the backend.
runtime_env={
"working_dir": working_dir,
}
)
Specifying the working_dir in my case ensures the latest code is pushed to the worker. Unfortunately when I do that the conda env I defined in my Dockerfile and used to launch CMD ["ray", "start", "--log-style=auto", "--address=ray-head:6023", "--num-gpus=1", "--block"] is not being used .
Found it, I just need to specify the name of the conda env in the runtime_env parameter like this:
job_id = client.submit_job(
# Entrypoint command to execute
entrypoint=full_command,
entrypoint_num_gpus=num_gpus,
# Ensures the AI code run is always the latest one synchronized with the backend.
runtime_env={
"working_dir": working_dir,
"conda": "base",
}
)