Starting worker with python subprocess is not working properly

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi,
I’m starting workers for a Ray Cluster via python subprocess to be able to control the worker via Python. This way I can start and stop the worker in a constantly running python service

process = subprocess.Popen(f"cmd.exe /K ray start --address={RAY_HEAD_ADDRESS} --node-ip-address={WORKER_IP}",
                               env=env,
                               cwd=working_directory, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
                               stderr=subprocess.PIPE)

The worker connects to the cluster but when I start deploying stuff I get an error when Ray tries to deploy something to that worker:

(raylet, ip=10.14.82.48) [2023-06-30 11:34:31,103 E 18216 16928] (raylet.exe) worker_pool.cc:544: Some workers of the worker process(18504) have not registered within the timeout. The process is still alive, probably it’s hanging during start.

It works when I start the worker from cmd line with the same command.

Does the python subprocess somehow interfer with Raylet because the cmd prompt is a child process of python?

Do you have a better solution to start and control the worker from python?

I appreciate any help!