Why is there a lot of Ray:: IDLE in my ray process

brosoul · April 27, 2023, 8:23am

As shown in the above figure, why are there many Ray:: IDLE in my ray process?

Is there any elegant way to terminate these IDLE without affecting the existing ray serve.

sangcho · April 27, 2023, 1:56pm

It is expected to start IDLE workers to reduce worker startup time. They will be used as soon as you create serve replicas. Ray creates idle workers as many as number of cpus on the node by default when you start a new script.

If you really want to disable having IDLE workers, you can set env var RAY_prestart_worker_first_driver=0 when you create a cluster. E.g.,

prestart_worker_first_driver=0 ray start --head

XIE · April 27, 2023, 2:37pm

This is a good question. Maybe we could document it in an Q&A session of the ray OSS website.

sangcho · April 27, 2023, 3:29pm

@XIE yeah for sure. I think we need to update the behavior of workers in the core documentation. We’ve been talking about this for a while, but it hasn’t happened yet

sangcho · September 26, 2023, 8:44pm

joso · August 6, 2024, 12:03pm

This env variable is not disabling ray::IDLE worker completely. They are only disabled when cluster is started for the first time.

This does not resolve hanging ray:IDLE worker which are left behind after a run.

Byrom · August 13, 2024, 11:50am

This might be quite a random contribution, but I noticed I had several ray::IDLE processes in my machine.
Turns out when you have htop on and start up your ray processes they show up as ray::IDLE even when they are computing something.
When I quit htop and restarted it it showed my ray processes with the proper ray::function_name.
When any worker process died and was restarted it also showed as ray::IDLE in htop (even when computing something) until I restarted htop.
Maybe this is what happened to you?

joso · October 2, 2024, 11:11am

Thanks for the response but unfortunately this is not what’s happening. Ray Dashboard is also showing ray::IDLEs which are not computing anything since submitted job has finished. But then when starting a new job they either prevent new tasks from being scheduled or it flat-out crashes due to OOM error.

Topic		Replies	Views
Ray job start up too slow on workers	0	504	September 11, 2022
Cannot figure out why number of processes less than number of available CPUs Ray Core	2	478	November 30, 2022
Autoscaler not removing idle workers Ray Clusters	2	660	April 12, 2023
ray::IDLE still takes a lot of memory Ray Core	3	881	February 11, 2025
Creation of new Ray workers Ray Core	5	295	December 15, 2022

Why is there a lot of Ray:: IDLE in my ray process

Related topics