Ray Head Node leave zombie processes after job is finished

I run a RayCluster(Ray 2.39.0) using KubeRay(1.2.2), and submit many job to it. I discover that there many zombie process left after the job is finished.

The zombie processes cause some psutil methods runs vary slow

It will leave 2 zombie processes when I submit one job. For more detail, when I submit a job, the JobSupervisor will start up at head node to hold the job, JobSupervisor(pid=152424) will run 2 subprocesses:

  1. /bin/bash -c python numpy-cpu-job-actor.py, pid is 152834
  2. /bin/bash -c while kill -s 0 152424; do sleep 1; done; kill -9 -152824, pid is 152836

When the job is finished, 152424 & 152834 is exited, but leave 152836 and its subprocess zombie: 1)[sh] <defunct>; 2) [sleep] <defunct>

Code of numpy-cpu-job-actor.py is

import ray
import numpy as np
import datetime

t0 = datetime.datetime.now()
formatted_time = t0.strftime("%Y-%m-%d %H:%M:%S,%f")[:-3]
print("Starting at ", formatted_time)

ray.init()
t1 = datetime.datetime.now()
formatted_time = t1.strftime("%Y-%m-%d %H:%M:%S,%f")[:-3]
print("Ray initialized at ", formatted_time)

@ray.remote
def cpu_intensive_task():
    result = 0
    tt1 = datetime.datetime.now()
    print("Start at ", tt1.strftime("%Y-%m-%d %H:%M:%S,%f")[:-3])
    for _ in range(int(5e6)):
        result += np.random.rand()
    tt2 = datetime.datetime.now()
    formatted_time1 = tt2.strftime("%Y-%m-%d %H:%M:%S,%f")[:-3]
    print("Finished at %s, cost %.2f second." % (formatted_time1, (tt2-tt1).total_seconds()))
    return result


t2 = datetime.datetime.now()
formatted_time = t2.strftime("%Y-%m-%d %H:%M:%S,%f")[:-3]
print("Placement group ready at ", formatted_time)


t3 = datetime.datetime.now()
formatted_time = t3.strftime("%Y-%m-%d %H:%M:%S,%f")[:-3]
print("Actor scheduled at ", formatted_time)

result_ids = [cpu_intensive_task.options().remote() for _ in range(2)]
t4 = datetime.datetime.now()
formatted_time = t4.strftime("%Y-%m-%d %H:%M:%S,%f")[:-3]
print("Actor task scheduled at ", formatted_time)

try:
    results = ray.get(result_ids)
    t5 = datetime.datetime.now()
    formatted_time = t5.strftime("%Y-%m-%d %H:%M:%S,%f")[:-3]
    print("Finished at ", formatted_time)
    print("Result: ")
    print(results)
    print("t0 - t1 - t2 - t3 - t4 - t5: %.2f - %.2f - %.2f - %.2f - %.2f" % ((t1-t0).total_seconds(), (t2-t1).total_seconds(), (t3-t2).total_seconds(), (t4-t3).total_seconds(), (t5-t4).total_seconds()))
except KeyboardInterrupt:
    print("terminatted.")
finally:
    ray.shutdown()
    print("Ray shutdown at ", formatted_time)

And the submit command is ray job submit --working-dir . -- python numpy-cpu-job.py

I’m wondering if I did something wrong that caused this, of if this is a community bug?

Hi Wangxin,
Reading the documentation we have on Lifetime of a User-Spawn Process might be helpful here. Ray has 2 variables that you can set that might help kill off these processes, specifically:

  • RAY_kill_child_processes_on_worker_exit (default is true)
  • RAY_kill_child_processes_on_worker_exit_with_raylet_subreaper (default is false) - * Only works on Linux greater than or equal to 3.4. If true, Raylet recursively kills any child processes and grandchild processes that were spawned by the worker after the worker exits. This works even if the worker crashed. The killing happens within 10 seconds after the worker death.

However, these only work on Linux. On non-Linux platforms, user-spawned process is not controlled by Ray. The user is responsible for managing the lifetime of the child processes. If the parent Ray worker process dies, the child processes will continue to run.

Are you running on Linux or something else? If you’re on Windows or Mac, there’s a few Python libraries such as psutil that might be helpful in creating a custom script to kill off zombie processes.

1 Like

Thanks for your reply. Setting RAY_kill_child_processes_on_worker_exit_with_raylet_subreaper to true is helpful for my issue, the zombie process is not remain after the job finished.

1 Like