Hello, setting RAY_kill_child_processes_on_worker_exit_with_raylet_subreaper
to true
resolves my issue, but it cause another question:
The ray job is marked as success
, even the python script runs failed.
see Unexpected job status - #2 by wangxin201492 for more detail.
Is there any better solution for this issue? Note: The zombie process is generated by JobSupervisor.