Process get stuck

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Our project to schedule 100 process on 1 machine into 1000 process on 10 machine.
Each process is to run a c++ execution binary with different params.
here is a sample of code.

from multiprocessing import Pool
import subprocess

def run_binary(params):
    subprocess.run([binary_path, params], stdout=output_fd, stderr=error_fd)

with Pool(100) as p:
        p.map(run_binary, params)

I try to change the code into something like this.

    with Pool(ray_address='ray://head_server_Ip:10001') as p:
        p.map(run_binary, params)

I try to follow this Environment Dependencies — Ray 1.13.0 to config the path for the binary and local environment.

The issue I meet is, the code is stuck on the sync the environment. and never run the binary on the worker after I wait for 20 mins.

@mabodx Could you show the code how you set up the environment dependencies?

@architkulkarni is there a way to monitor the progress of installing environment dependencies?

You can monitor the setup logs at runtime_env_setup_*.log or dashboard_agent.log on the Ray cluster nodes to see if it’s getting stuck during installation.