Running Ray Cluster Inside Subprocess

1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

  • Ray version: 2.51.1
  • Python version: 2.11.9
  • OS: Windows 11

3. What happened vs. what you expected:

I am attempting to work around errors with running Ray within a pyinstaller-compiled .exe file by using a separately included python interpreter. Initializing a python subprocess, starting a Ray cluster, and then connecting to it back in the main program has been successful. However, when attempting to perform any action requiring transmission of data in a format designated by third-party libraries (e.g. numpy), remote actor calls fail/timeout. A minimum reproducible example is below. Perhaps the runtime environment of the Ray cluster needs to have already imported libraries intended for use? Appreciate any advice on making the Tester_Failure actor below succeed. Thanks to all in advance.

import numpy as np
import ray
import subprocess
import time

#Start Ray cluster inside a subprocess running Python
process = subprocess.Popen(['python', '-i'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
command = "import ray; _ = ray.init()\n".encode("utf-8")
process.stdin.write(command)
process.stdin.flush()
time.sleep(30)

#Connect to existing Ray cluster
_ = ray.init(address='auto')

#Perform task that does not involve data requiring external library
@ray.remote
class Tester_Success:
    def test(self, string): return 0
actor = Tester_Success.remote()
task = actor.test.remote('Test')
try: result = ray.get(task, timeout=10)
except: print('Test failed')

#Perform task involving data requiring external library
@ray.remote
class Tester_Failure:
    def test(self, array): return 0
actor = Tester_Failure.remote()
task = actor.test.remote(np.zeros((10,10)))
try: result = ray.get(task, timeout=10)
except: print('Test failed')

It seems likely that your issue is due to the Ray worker subprocesses not having the same Python environment (including third-party libraries like numpy) as your main process. When Ray schedules remote tasks or actors, the worker processes must be able to import and use all required libraries. If the Ray cluster is started in a subprocess with a different environment, or if numpy is not installed/imported in that subprocess, serialization/deserialization of numpy arrays will fail, causing timeouts or errors in remote calls involving numpy data. This is a common pitfall when using Ray with subprocesses or pyinstaller-compiled executables, especially on Windows, as Ray expects a consistent Python environment across all processes and nodes (see related discussion).

My understanding is that to resolve this, you should ensure that the Ray worker subprocess is started with the same environment as your main process, and that all required libraries (like numpy) are installed and importable in that environment. You may also need to explicitly import numpy in the subprocess before starting Ray, so that the worker processes inherit the correct module state. Alternatively, consider using Ray’s runtime_env feature to specify dependencies, though this may have limitations on Windows. Running Ray in subprocesses or from pyinstaller executables is not a well-supported or tested pattern, and behaviors may be undefined (see Ray team response).

Would you like more detail or a step-by-step breakdown?

Sources:

Hint: Mention @RunLLM in the post for followups.

Neither explicit import of numpy within the subprocess before ray initialization, nor use of the runtime_env appear to solve this issue.

If a separate command prompt is manually opened, runs python, and executes ‘import ray; _ = ray.init()’ then the remaining portion of the script works perfectly (no prints of ‘Test failed’).

Similarly, if the script is modified to create a command prompt in a new console: process = subprocess.Popen(‘cmd.exe’, creationflags=subprocess.CREATE_NEW_CONSOLE), the main program process slept while the ray initialization commands are manually issued in the new console, and then the main program process resumed, this also does not produce a failure.

Failure only occurs when the subprocess is automated to run the initialization commands. It is unclear why changing stdin/stdout should or can impact the result of ray initialization, but that is the only clear difference.

Working on the assumption that it had to be the stdin/stdout redirection preventing functionality, tried removing them. Issue seems tied to just stdin, as this appears to solve the issue (at least outside of pyinstaller):

process = subprocess.Popen('python -ic "import ray; _=ray.init()"', stdout=subprocess.PIPE, creationflags=subprocess.CREATE_NO_WINDOW)
time.sleep(30)