In the config of my PPOTrainer I had set "num_workers": 0 so far, but when I change it to num_workers > 0 then it raises a ModuleNotFoundError: No module named … for the import of my custom env. Is this issue the same as mentioned here?
Yet, I’m not familiar with all the effects of num_workers > 0. I’m working on a single machine (desktop computer) and don’t know if num_workers > 0 works in this case anyway? Or is it just possible setting for remote workers running on multiple machines?
Some enlightening thoughts are really appreciated.
The actor died because of an error raised in its creation task, e[36mray::RolloutWorker.init()e[39m (pid=16960, ip=192.168.250.100)
File “python\ray_raylet.pyx”, line 460, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 481, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 351, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: No module named ‘galvcon’
traceback: Traceback (most recent call last):
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\serialization.py”, line 248, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\serialization.py”, line 190, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\serialization.py”, line 168, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File “C:\Users\user\AppData\Local\Programs\Python\Python38\lib\site-packages\ray\serialization.py”, line 158, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
ModuleNotFoundError: No module named ‘galvcon’
Yes you can run with more than 1 worker even on a single machine.
Each worker runs in a separate process. What you are likely seeing is that the workers do not have the same path or environment variable set and cannot find your module.
@TanjaBayer I have a file called main.py where I import the custom RL env, i.e. from my_module.sub_folder.sub_sub_folder.my_env import MyEnv. In main.py I create a RL Trainer which is given this custom env. When I have "num_workers": 0 in the Trainer config then everything works fine since there is only a “local worker”, but with "num_workers" > 0 (i.e. “local worker” + “remote workers”) the error occurs.
As a first workaround I copied the folder of my_module into the rllib folder and changed the import in main.py: from ray.rllib.my_module.sub_folder.sub_sub_folder.my_env import MyEnv
Remote workers would fail to deserialize data from modules imported from within the project. In my case, it was due to the deserialization process being attempted from the workers’ initial directory instead of the working dir. I could fix this issue by running a worker_process_setup_hook function before submitting tasks to the workers. This function would perform an os.chdir so that the (relative) imports work well: