Remote Rollout Workers use local module from repo instead of "pip-installed" module

Hello,

I have the following situation:
In my local repo I develop a Python package my_package and also build an installation wheel to install my_package via pip on my machine (…/site-packages/my_package). Furthermore, I have a (main) script where I import my_package (or parts of it) and run my RLlib training using Ray Remote (Rollout) Workers (i.e. num_workers > 0).
Now, suppose I alter something in the source code of a module (e.g. config) located in …/site-packages/my_package, but the source code in my repo folder is unchanged. The consequence is that the Ray Remote Workers do rollouts using the source code from my repo folder while some other parts of my RLlib training use the altered source code (config) in …/site-packages/my_package. These different versions lead to an exception and the death of all workers.

I guess the problem is that Ray Remote Workers cannot properly import my_package. It seems that the dependencies on my single-node cluster will not be properly resolved.
I have found this in the docs and tried that in my main script as follows

...
runtime_env = {"py_modules": [my_package]}
ray.init(log_to_driver=False, runtime_env=runtime_env)
...

At a first glance, the issue is solved, but I do not know whether this is the way to do it?

@klausk55 , as far as I know this is the proposed way to do it.

1 Like

Thanks for your feedback @Lars_Simon_Zehnder!

1 Like