I’m a beginner to Ray, I created a Ray Cluster on Kubernetes customized with a Dockerfile to install conda and create my conda environments. My Dockerfile looks like this:
FROM rayproject/ray:2.8.0-py310
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
RUN sudo chmod 666 /var/lib/apt/lists/*
RUN sudo apt-get update
RUN sudo apt-get install -y wget && sudo rm -rf /var/lib/apt/lists/*
RUN sudo sudo wget \
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& sudo mkdir /root/.conda \
&& sudo bash Miniconda3-latest-Linux-x86_64.sh -b \
&& sudo rm -f Miniconda3-latest-Linux-x86_64.sh
RUN conda --version
# Create the /app directory
WORKDIR /app
# Copy environment files to the /app directory
COPY envs/base_env.yml /app/base_env.yml
COPY envs/custom_env.yml /app/custom_env.yml
# Create Conda environments
RUN conda env create -f base_env.yml && \
conda env create -f custom_env.yml
# Activate the base environment and install additional packages (if needed)
RUN echo "Activating base environment" && \
echo "conda activate base_env" >> ~/.bashrc
# Set the default command to start the application
CMD ["bash"]
with the following yaml values:
image:
repository: my-dev-registry:5000/ray
tag: latest
pullPolicy: IfNotPresent
head:
enableInTreeAutoscaling: true
rayStartParams:
# don't use head node for scheduling
# as per https://github.com/ray-project/kuberay/blob/master/docs/best-practice/worker-head-reconnection.md#best-practice
num-cpus: 0
resources:
limits:
cpu: "6"
memory: "8G"
requests:
cpu: "6"
memory: "8G"
worker:
replicas: 2
minReplicas: 2
maxReplicas: 10
resources:
limits:
cpu: "8"
memory: "8G"
requests:
cpu: "8"
memory: "8G"
and the tilt file:
# install kuberay
..
# install kuberay-operator
..
# install ray cluster
helm_resource(
name="raycluster",
chart="kuberay/ray-cluster",
release_name="raycluster",
namespace="ray-prototype",
flags=[
"--create-namespace",
"--version=0.6.0",
"--wait",
"--debug",
"--values=infra/raycluster.yaml",
]
)
In the terminal, I start ray using ray start --head --ray-debugger-external
, submitting jobs and running workflows seems to be working just fine, until I add the conda
environment as the following:
# either by submitting a job
ray job submit --address http://localhost:8265 --runtime-env-json='{"working_dir": ".", "conda": "base_env"}' -- python rayscripts/workflow.py
# or inside the python script
ray.init(
address="auto",
runtime_env={
"working_dir": "../",
"conda": "base_env" # <---- this line
},
)
I get the following Errors:
File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'conda'
File "/home/alikleit/.local/lib/python3.10/site-packages/ray/_private/runtime_env/conda_utils.py", line 170, in get_conda_env_list
raise ValueError(f"Could not find Conda executable at {conda_path}.")
ValueError: Could not find Conda executable at conda.
I tried to inspect the head node using kubectl exec -it $HEAD_POD -- sh
and conda env list
I do get the following:
# conda environments:
#
base /home/ray/anaconda3
base_env /home/ray/anaconda3/envs/base_env
custom_env /home/ray/anaconda3/envs/custom_env
# and using which conda
/home/ray/anaconda3/bin/conda
Another try is to see if conda is present while the workflow is executed, by removing the conda
from the init()
function and adding the following to a @ray.remote
function:
result = subprocess.run(["conda", "env", "list"], capture_output=True, text=True)
Where I also get FileNotFoundError: [Errno 2] No such file or directory: 'conda'
Is there something I’m missing?