How to invoke conda env in subprocess during a ray job?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.
  • The goal is to have python invoke a command line program via subprocess() that is currently installed via conda.

  • This command line program is installed via conda

  • Subprocess invokes the conda environment to run this command line program.

Given that ray mangles the conda environment name in runtime_env definitions, what is the best approach to support this use-case?

So far, I see two paths:

  1. (hacky) search the given conda environments on the ray worker, search within each conda env the command line program, and invoke subprocess accordingly.
  2. (better?) convert the conda install of the command line program to apt-get package and provide the command line access at the OS level by installing the program at the setup step in the cluster.yaml definition.

Any other thoughts are appreciated.

Hmm not sure that I understand the issue exactly. Can you not invoke the subprocess with the full path of the desired conda env? E.g., /home/<user>/anaconda3/envs/<env>/bin/...?

Hi Stephanie,

You are correct.

So to clarify, my original question was a general question of how to invoke subprocess command line programs in python within a ray process setup with a conda environment – the issue being that the user does not know apriori the conda name of the environment ray sets up for you.

The solution :

-You need a handle to the conda environment that ray activated on behalf of your defined environment.

  • Since ray did not provide conda labels for this environment, you will get a conda path to your environment.
  • then you need to execute ‘conda run -p’ with the -p flag for
    path.

Below is a self contained example that you can submit to your ray cluster

ray job submit --runtime-env my_ray_env.yml --address http://localhost:8265 --working-dir . -- python test_subprocess_and_conda.py 
#my_ray_env.yml
conda:
  channels:
    - conda-forge
  dependencies:
    - openbabel

working-dir: "."
#test_subprocess_and_conda.py
import ray
from ray.runtime_env import RuntimeEnv

import subprocess
import shlex
import os

runtime_env = RuntimeEnv(conda={
    "channels": ["conda-forge"], "dependencies": ["openbabel"]})

ray.init()

@ray.remote(runtime_env=runtime_env)
def f(x):
    # get the conda environment ray setup for you and activated (NB: this is if conda was used to setup env in ray)
    active_conda_env = os.environ['CONDA_DEFAULT_ENV']
    # the conda environment label is passed as the path (ray did not define a conda env name), so you use the -p flag to invoke this environment.
    output = subprocess.run(shlex.split(f"conda run -p {active_conda_env} obabel -h"),capture_output=True,text=True,shell=False)
    print(output.stdout)

futures = [f.remote(i) for i in range(4)]
print(ray.get(futures))```

Hmm sorry I think I’m still misunderstanding – does the script you provided work as intended or is there still an issue?

Is the working script for those that want to run subprocess commands within ray python and conda

to futher clarify, there is no more issue :slight_smile:
thanks stephanie.