Import modules from a local conda environment to a Ray worker

wkyu_katana · February 9, 2022, 11:50pm

I have a python program (ray_localmodule_test.py as listed at the very end) that calls a local module (datasets.py) in the same directory and another module (Graph) from a different directory. I passed PYTHONPATH as an environmental variable with a long list of paths for all modules needed.

But it seems that this local module (dataset) cannot be imported properly when it is called from a ray.remote actor. At the same time, the module (Graph) from a different directory is imported properly. As listed in the program, the statement (ds.rdg_dataset_url("gnn_tester", "local")) is called twice, the one inside an ray Actor is causing an error below.

E                       ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::Worker.__init__() (pid=88366, ip=10.142.0.14)
E                       RuntimeError: The actor with name Worker failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:
E                       
E                       ray::Worker.__init__() (pid=88366, ip=10.142.0.14)
E                         File "/home/wkyu/anaconda3/envs/ray-katana-dev/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle.py", line 679, in subimport
E                           __import__(name)
E                       ModuleNotFoundError: No module named 'datasets'

Any suggestion on this?

import datasets as ds
import numpy as np
import pytest
import ray
import ray.util.collective as collective

import katana.distributed
from katana.distributed import Graph

ray.init(namespace='coll', runtime_env={"env_vars": {"PYTHONPATH": "/home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python:/home/wkyu/katana/master/katana_enterprise_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/katana_enterprise_python_build/python:/home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python:/home/wkyu/katana/katana-enterprise-master/python/test"}})

@ray.remote
class Worker:
   def __init__(self):
       katana.distributed.initialize()

       self.send = np.ones((4, ), dtype=np.float32)
       self.recv = np.zeros((4, ), dtype=np.float32)
       self.graph = Graph("gs://katana-demo-datasets/unit-test-inputs/gnn/tester")
#       self.graph = ds.rdg_dataset_url("gnn_tester", "local")

   def setup(self, world_size, rank):
       collective.init_collective_group(world_size, rank, "gloo", "default")
       return True

   def compute(self):
       collective.allreduce(self.send, "default")
       return self.send

num_workers = 2
workers = []
init_rets = []
tester_graph = ds.rdg_dataset_url("gnn_tester", "local")

def test_gluon_vector_comm():
    for i in range(num_workers):
       w = Worker.remote()
       workers.append(w)
    _options = {
       "group_name": "default",
       "world_size": 2,
       "ranks": [0, 1],
       "backend": "gloo"
    }
    collective.create_collective_group(workers, **_options)
    results = ray.get([w.compute.remote() for w in workers])

ray.shutdown()

yic · February 11, 2022, 8:00pm

Hi there,

I think this is because the remote node doesn’t have the right environment. There are two ways to do this,

you can try to make the environment of the cluster the same. basically each node are with the same things installed.
you can try runtime env (Handling Dependencies — Ray v1.10.0)

wkyu_katana · February 11, 2022, 8:15pm

I have only one head node in the cluster. I will explore on the runtime env pointer.

yic · February 11, 2022, 8:15pm

Ok, it seems you’re running locally. Sorry for missing that part. But I think runtime env should still work here. Do you want to give this a try:

ray.init(namespace='coll', runtime_env={"working_dir": "./"})

and run it in the place of the script.

With this, all ray workers are supposed to run in this working_dir. But keep in mind that please don’t make the dir too big since it’ll be stored in some place in ray cluster.

I’m not sure why PYTHONPATH is not working and setup properly in runtime env, maybe @architkulkarni can give some ideas. Ideally, when it’s set, it should be set before running all other python workers.

wkyu_katana · February 11, 2022, 8:16pm

Thanks. I will try as you suggested.

will · February 11, 2022, 8:17pm

@wkyu_katana just to confirm, you are running on local cluster, and have multiple directories you need to get dependencies from on local laptop, yes?

@yic can he have / does he need multiple working_dir?

yic · February 11, 2022, 8:20pm

No multiple working dir is not supported. working_dir is just a place for code. Usually it’ll be the dir where the script is run.

I think for this case,

runtime_env={"py_modules": [Graph], "working_dir": "./"}

should help.

wkyu_katana · February 11, 2022, 8:32pm

I am running a VM instance from GCP. There are multiple directories with various python modules. The troubling one (datasets.py) is a small python program in the same directory (${HOME}/katana/katana-enterprise-master/python/test) with my script (ray_module_test.py). But I was running ray_module_test.py from the directory (${HOME}/katana/master) through pytest (pytest -s -v ../katana-enterprise-master/python/test/ray_module_test.py)

wkyu_katana · February 11, 2022, 8:34pm

I will try this later and update you.

yic · February 11, 2022, 8:48pm

Btw, in this case maybe you want to put absolute path there.

runtime_env={"py_modules": [Graph], "working_dir": "_your_home_dir_/katana/katana-enterprise-master/python/test"}

The CWD the worker will use is:

if no working dir in runtime env:
- if you are connecting to a cluster started with ray start, it’ll be the dir where ray start is called.
- otherwise, it’ll be the place where you run the script. for your case, it’s where pytest is called.
if with working dir in runtime env:
- ray will upload the working dir and prepare it for all workers.

architkulkarni · February 12, 2022, 12:28am

Agree with what @yic mentioned above.

Setting PYTHON_PATH in env_vars should work, I’m not sure why it’s not working. @wkyu_katana is it possible to print os.environ inside the Ray Actor to see if the PYTHONPATH environment variable is set correctly there?

wkyu_katana · February 14, 2022, 12:40am

Dear All,

I tried the provided suggestions. Here is a summary of my observations.

The script works when I use ray.init(namespace='coll', runtime_env={"working_dir": "./"}), and run it from where it is located, i.e., /home/wkyu/katana/katana-enterprise-master/python/test.
The value from the script for PYTHONPATH is not adopted at run-time. Instead, it takes the actual value from my current shell. Specifically, when the value in my shell is as follows:

$ printenv | grep PYTHONPATH
PYTHONPATH=/home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python:/home/wkyu/katana/master/katana_enterprise_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/katana_enterprise_python_build/python:/home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python

trying to expand the value of PYTHONPATH in the script does not work.

runtime_env={“env_vars”: {“PYTHONPATH”: “/home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python:/home/wkyu/katana/master/katana_enterprise_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/katana_enterprise_python_build/python:/home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python:/home/wkyu/katana/katana-enterprise-master/python/test”}}

When printing the value via os.environ, I got the following output.

(Worker pid=12232) /home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python:/home/wkyu/katana/master/katana_enterprise_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/katana_enterprise_python_build/python:/home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python

But it works when I append the env variable PYTHONPATH with /home/wkyu/katana/katana-enterprise-master/python/test.

When specifying Graph as a module for py_modules, runtime_env={"py_modules": [Graph], "working_dir": "./"}, I got the following error.

/home/wkyu/anaconda3/envs/ray-katana-dev/lib/python3.8/site-packages/ray/_private/runtime_env/py_modules.py:64: in upload_py_modules_if_needed
    raise TypeError("py_modules must be a list of file paths, URIs, "
E   TypeError: py_modules must be a list of file paths, URIs, or imported modules, got <class 'type'>.

Graph in my script is both the name of a class type and also the name of a module. It seems that ray interprets Graph first as a class type instead of a module name, even if it is provided for py_modules. Without "py_modules": [Graph], as mentioned in (1), the script works fine.

yic · February 14, 2022, 7:57pm

Maybe you need to put katana.distributed there.

It’s good to know 1) works. I think, when you move to a cluster environment, you probably either need 1) make the cluster homogeneous; or 2) use runtime env to prepare the environment like here.

architkulkarni · February 16, 2022, 12:04am

Thanks for the summary!
2) This sounds like it could be a bug, but I can’t reproduce it.

import ray
import os

ray.init(runtime_env={"env_vars": {"PYTHONPATH": "test"}})

@ray.remote
def f():
    return os.environ["PYTHONPATH"]

print(ray.get(f.remote()))

this prints :test as expected.

If you can find a minimal reproduction script for the PYTHONPATH issue, could you please create an issue on the Ray github and include the script? Ideally the script should not have any other dependencies.

Ah, I see. Your explanation makes sense, Graph is the name of the class. In this case, if you don’t want to rename the class or the module, you can use the local path to the module instead of the Python module object.

Topic		Replies	Views
Configure runtime_env for multiple local packages Ray Clusters	4	95	October 15, 2024
No module named ... in ray cluster Ray Core	5	5305	November 22, 2022
Runtime_env on driver node Ray Core	3	432	December 29, 2022
ModuleNotFoundError for custom env import when num_workers > 0 Ray Core	6	1075	February 15, 2024
Sharing locally defined modules / objects to a remote head Ray Core	6	2632	August 17, 2023

Import modules from a local conda environment to a Ray worker

Related topics