How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi,
I’m starting workers for a Ray Cluster via python subprocess to be able to control the worker via Python. This way I can start and stop the worker in a constantly running python service
process = subprocess.Popen(f"cmd.exe /K ray start --address={RAY_HEAD_ADDRESS} --node-ip-address={WORKER_IP}",
env=env,
cwd=working_directory, stdin=subprocess.PIPE, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
The worker connects to the cluster but when I start deploying stuff I get an error when Ray tries to deploy something to that worker:
(raylet, ip=10.14.82.48) [2023-06-30 11:34:31,103 E 18216 16928] (raylet.exe) worker_pool.cc:544: Some workers of the worker process(18504) have not registered within the timeout. The process is still alive, probably it’s hanging during start.
It works when I start the worker from cmd line with the same command.
Does the python subprocess somehow interfer with Raylet because the cmd prompt is a child process of python?
Do you have a better solution to start and control the worker from python?
I appreciate any help!