Ray client fails when specifying Conda Environment

sfigueroa · October 14, 2021, 6:57pm

Hello,

I am trying to deploy a Ray cluster on Kubernetes while specifying a particular conda environment for the workers but am unable to do so.

I am using Ray v1.7.0 with python 3.8 and microk8 v1.22.2 as the Kubernetes environment, and am following the steps on Installing the Ray Operator with Helm

The cluster works great when I call:

ray.init("ray://192.168.1.191:10001")

and I am able to set runtime environments like

ray.init("ray://192.168.1.191:10001", 
        runtime_env = {"env_vars": {
                "OMP_NUM_THREADS": "32", "TF_WARNINGS": "none"
               }})

with no issue.

The problem arises when I try to specify a conda environment.

ray.init("ray://192.168.1.191:10001", 
         runtime_env = {"conda": {
                "dependencies": ["pip", {
                    "pip": ["pendulum"]
                    }]
                }})

I get the following error:

ConnectionAbortedError: Initialization failure from server:
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 612, in Datapath
    if not self.proxy_manager.start_specific_server(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 269, in start_specific_server
    serialized_runtime_env_context = self._create_runtime_env(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 225, in _create_runtime_env
    raise RuntimeError(
RuntimeError: Failed to create runtime_env for Ray client server: [Errno 2] No such file or directory: '/tmp/ray/session_2021-10-14_09-40-18_105353_113/runtime_resources/conda/ray-e10fe98776459d9b5c7be1d91a5dcb02493e4749'

I have also tried adding the conda environment to the rayImage and simply calling the prebuilt conda environment. In that case I get:

ConnectionAbortedError: Initialization failure from server:
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 612, in Datapath
    if not self.proxy_manager.start_specific_server(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 269, in start_specific_server
    serialized_runtime_env_context = self._create_runtime_env(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 250, in _create_runtime_env
    raise TimeoutError(
TimeoutError: CreateRuntimeEnv request failed after 5 attempts.

Any help on this would be greatly appreciated!

architkulkarni · October 15, 2021, 11:25pm

Hi @sfigueroa, sorry you’re running into this issue and thanks for reporting this. This looks like a bug, we’ll get a fix out as soon as possible.

architkulkarni · October 15, 2021, 11:48pm

Hi @sfigueroa , can you try pip install "ray[default]" and see if that fixes things? This installs some dependencies which are required for runtime environments.

We’ll add a better error message that detects when ray[default] isn’t installed, instead of failing mysteriously as you experienced. Sorry about that!

dmitry.karpeyev · November 19, 2021, 2:04am

I run into the same problem with python3.7, ray-1.8.0 running locally and on Kubernetes (GKE) deployed using a current Ray Helm chart. Using pip install ray[default] locally doesn’t seem to help. Any suggestions on how to fix this?

I should add that instead of conda dependencies I am specifying pip dependencies in the runtime_env dict.

My wild guess is that this has something to do with the ephemeral nature of container storage where /tmp resides? I could be way off, though.

Dmitry

architkulkarni · December 6, 2021, 8:06pm

Hi @dmitry.karpeyev , sorry for the late response here. You would need ray[default] on all nodes of the cluster. Could you share the logs if you have them? There should be logs for the pip installation in dashboard_agent.log or ray_client_server_... files.

Topic		Replies	Views
Conda run_env in custom docker image for Kuberay Kubernetes	0	166	March 25, 2024
Custom conda environment does not allow jobs to execute Ray Clusters	6	1385	May 27, 2022
Could not find Conda executable at Conda Kubernetes	1	497	November 30, 2023
Using conda in runtime_env Ray Core	5	2021	September 14, 2023
[docker] cannot use runtime env with ray docker: No module named 'ray' Ray Core	2	278	February 19, 2024

Ray client fails when specifying Conda Environment

Related topics