I have a python program (ray_localmodule_test.py as listed at the very end) that calls a local module (datasets.py) in the same directory and another module (Graph) from a different directory. I passed PYTHONPATH as an environmental variable with a long list of paths for all modules needed.
But it seems that this local module (dataset) cannot be imported properly when it is called from a ray.remote actor. At the same time, the module (Graph) from a different directory is imported properly. As listed in the program, the statement (ds.rdg_dataset_url("gnn_tester", "local")) is called twice, the one inside an ray Actor is causing an error below.
E ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::Worker.__init__() (pid=88366, ip=10.142.0.14)
E RuntimeError: The actor with name Worker failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:
E
E ray::Worker.__init__() (pid=88366, ip=10.142.0.14)
E File "/home/wkyu/anaconda3/envs/ray-katana-dev/lib/python3.8/site-packages/ray/cloudpickle/cloudpickle.py", line 679, in subimport
E __import__(name)
E ModuleNotFoundError: No module named 'datasets'
Any suggestion on this?
import datasets as ds
import numpy as np
import pytest
import ray
import ray.util.collective as collective
import katana.distributed
from katana.distributed import Graph
ray.init(namespace='coll', runtime_env={"env_vars": {"PYTHONPATH": "/home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python:/home/wkyu/katana/master/katana_enterprise_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/katana_enterprise_python_build/python:/home/wkyu/katana/master/external/katana/katana_python_build/build/lib.linux-x86_64-3.8:/home/wkyu/katana/master/external/katana/katana_python_build/python:/home/wkyu/katana/katana-enterprise-master/python/test"}})
@ray.remote
class Worker:
def __init__(self):
katana.distributed.initialize()
self.send = np.ones((4, ), dtype=np.float32)
self.recv = np.zeros((4, ), dtype=np.float32)
self.graph = Graph("gs://katana-demo-datasets/unit-test-inputs/gnn/tester")
# self.graph = ds.rdg_dataset_url("gnn_tester", "local")
def setup(self, world_size, rank):
collective.init_collective_group(world_size, rank, "gloo", "default")
return True
def compute(self):
collective.allreduce(self.send, "default")
return self.send
num_workers = 2
workers = []
init_rets = []
tester_graph = ds.rdg_dataset_url("gnn_tester", "local")
def test_gluon_vector_comm():
for i in range(num_workers):
w = Worker.remote()
workers.append(w)
_options = {
"group_name": "default",
"world_size": 2,
"ranks": [0, 1],
"backend": "gloo"
}
collective.create_collective_group(workers, **_options)
results = ray.get([w.compute.remote() for w in workers])
ray.shutdown()
With this, all ray workers are supposed to run in this working_dir. But keep in mind that please don’t make the dir too big since it’ll be stored in some place in ray cluster.
I’m not sure why PYTHONPATH is not working and setup properly in runtime env, maybe @architkulkarni can give some ideas. Ideally, when it’s set, it should be set before running all other python workers.
I am running a VM instance from GCP. There are multiple directories with various python modules. The troubling one (datasets.py) is a small python program in the same directory (${HOME}/katana/katana-enterprise-master/python/test) with my script (ray_module_test.py). But I was running ray_module_test.py from the directory (${HOME}/katana/master) through pytest (pytest -s -v ../katana-enterprise-master/python/test/ray_module_test.py)
Setting PYTHON_PATH in env_vars should work, I’m not sure why it’s not working. @wkyu_katana is it possible to print os.environ inside the Ray Actor to see if the PYTHONPATH environment variable is set correctly there?
I tried the provided suggestions. Here is a summary of my observations.
The script works when I use ray.init(namespace='coll', runtime_env={"working_dir": "./"}), and run it from where it is located, i.e., /home/wkyu/katana/katana-enterprise-master/python/test.
The value from the script for PYTHONPATH is not adopted at run-time. Instead, it takes the actual value from my current shell. Specifically, when the value in my shell is as follows:
But it works when I append the env variable PYTHONPATH with /home/wkyu/katana/katana-enterprise-master/python/test.
When specifying Graph as a module for py_modules, runtime_env={"py_modules": [Graph], "working_dir": "./"}, I got the following error.
/home/wkyu/anaconda3/envs/ray-katana-dev/lib/python3.8/site-packages/ray/_private/runtime_env/py_modules.py:64: in upload_py_modules_if_needed
raise TypeError("py_modules must be a list of file paths, URIs, "
E TypeError: py_modules must be a list of file paths, URIs, or imported modules, got <class 'type'>.
Graph in my script is both the name of a class type and also the name of a module. It seems that ray interprets Graph first as a class type instead of a module name, even if it is provided for py_modules. Without "py_modules": [Graph], as mentioned in (1), the script works fine.
It’s good to know 1) works. I think, when you move to a cluster environment, you probably either need 1) make the cluster homogeneous; or 2) use runtime env to prepare the environment like here.
Thanks for the summary!
2) This sounds like it could be a bug, but I can’t reproduce it.
import ray
import os
ray.init(runtime_env={"env_vars": {"PYTHONPATH": "test"}})
@ray.remote
def f():
return os.environ["PYTHONPATH"]
print(ray.get(f.remote()))
this prints :test as expected.
If you can find a minimal reproduction script for the PYTHONPATH issue, could you please create an issue on the Ray github and include the script? Ideally the script should not have any other dependencies.
Ah, I see. Your explanation makes sense, Graph is the name of the class. In this case, if you don’t want to rename the class or the module, you can use the local path to the module instead of the Python module object.