Hello,
I am trying to deploy a Ray cluster on Kubernetes while specifying a particular conda environment for the workers but am unable to do so.
I am using Ray v1.7.0 with python 3.8 and microk8 v1.22.2 as the Kubernetes environment, and am following the steps on Installing the Ray Operator with Helm
The cluster works great when I call:
ray.init("ray://192.168.1.191:10001")
and I am able to set runtime environments like
ray.init("ray://192.168.1.191:10001",
runtime_env = {"env_vars": {
"OMP_NUM_THREADS": "32", "TF_WARNINGS": "none"
}})
with no issue.
The problem arises when I try to specify a conda environment.
ray.init("ray://192.168.1.191:10001",
runtime_env = {"conda": {
"dependencies": ["pip", {
"pip": ["pendulum"]
}]
}})
I get the following error:
ConnectionAbortedError: Initialization failure from server:
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 612, in Datapath
if not self.proxy_manager.start_specific_server(
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 269, in start_specific_server
serialized_runtime_env_context = self._create_runtime_env(
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 225, in _create_runtime_env
raise RuntimeError(
RuntimeError: Failed to create runtime_env for Ray client server: [Errno 2] No such file or directory: '/tmp/ray/session_2021-10-14_09-40-18_105353_113/runtime_resources/conda/ray-e10fe98776459d9b5c7be1d91a5dcb02493e4749'
I have also tried adding the conda environment to the rayImage and simply calling the prebuilt conda environment. In that case I get:
ConnectionAbortedError: Initialization failure from server:
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 612, in Datapath
if not self.proxy_manager.start_specific_server(
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 269, in start_specific_server
serialized_runtime_env_context = self._create_runtime_env(
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 250, in _create_runtime_env
raise TimeoutError(
TimeoutError: CreateRuntimeEnv request failed after 5 attempts.
Any help on this would be greatly appreciated!