- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I am using one node. I do:
ray start --head --dashboard-host “0.0.0.0”
And in the usage section, I see:
Great, because I have 28 CPUs.
Now I do:
And I see
Stopped all 7 Ray processes.
As I understand it, Ray should make 1 process for each worker (see the question here for example). I also double checked this by running a job, checking the dashboard and confirming that at most there only 7 workers.
i) Why are there not 28 processes?
ii) How can I make Ray use 28 processes (i.e., one process per CPU)?
The processes you are seeing are long-lived system-level processes that outlive individual Ray jobs. Ray automatically starts workers as needed based on submitted tasks and their resource requirements. Since there are no active jobs, there won’t be any workers.
Once you do submit a job, you should see worker processes start up. The actual number of worker processes is usually ~number of CPUs, but can be more or less depending on worker crashes and specific workload. There are two ways to submit a job:
- Run a Python script that calls
ray.init(). If you can access the head node, this is the simplest
- Submit a packaged Ray job.
Very helpful. Thank you!!