Why is there a lot of Ray:: IDLE in my ray process

As shown in the above figure, why are there many Ray:: IDLE in my ray process?

Is there any elegant way to terminate these IDLE without affecting the existing ray serve.

2 Likes

It is expected to start IDLE workers to reduce worker startup time. They will be used as soon as you create serve replicas. Ray creates idle workers as many as number of cpus on the node by default when you start a new script.

If you really want to disable having IDLE workers, you can set env var RAY_prestart_worker_first_driver=0 when you create a cluster. E.g.,

prestart_worker_first_driver=0 ray start --head
3 Likes

This is a good question. Maybe we could document it in an Q&A session of the ray OSS website.

@XIE yeah for sure. I think we need to update the behavior of workers in the core documentation. We’ve been talking about this for a while, but it hasn’t happened yet

This env variable is not disabling ray::IDLE worker completely. They are only disabled when cluster is started for the first time.

This does not resolve hanging ray:IDLE worker which are left behind after a run.

This might be quite a random contribution, but I noticed I had several ray::IDLE processes in my machine.
Turns out when you have htop on and start up your ray processes they show up as ray::IDLE even when they are computing something.
When I quit htop and restarted it it showed my ray processes with the proper ray::function_name.
When any worker process died and was restarted it also showed as ray::IDLE in htop (even when computing something) until I restarted htop.
Maybe this is what happened to you?

Thanks for the response but unfortunately this is not what’s happening. Ray Dashboard is also showing ray::IDLEs which are not computing anything since submitted job has finished. But then when starting a new job they either prevent new tasks from being scheduled or it flat-out crashes due to OOM error.