Killing driver does not kill tasks in Ray on minikube

asm582 · April 19, 2021, 10:05pm

Hello,

I use sample program available from one of the ray tutorial and run below code snippet on the ray head node CLI:

@ray.remote
def f():
time.sleep(2)

#call ray init

result = ray.get(f.remote for _ in range 1500)

on head node CLI when I execute the above command a demand of 1500 CPUs is created. In my minikube env I do not have 1500 CPUs, so I think tasks are executed in ray defined order.

When I exit the CLI command by ctrl+Z, while the 1500 tasks are completing I was expecting that the remaining tasks should have been terminated but that is not the case. The tasks keep on executing even when I have killed the CLI command.

When I resubmit 1500 tasks again from CLI, I am not sure in what order the tasks are executed, will ray finish my killed CLI task first before executing the new tasks?

Can you please comment if this is expected?

Dmitri · April 20, 2021, 3:28pm

Sounds like a bug
@sangcho exiting the driver should cancel pending tasks, right?

sangcho · April 20, 2021, 6:22pm

Did you call ray.init() or ray.init(address=‘auto’)?

asm582 · April 20, 2021, 6:26pm

I think I called ray.init(address=‘auto’)

sangcho · April 20, 2021, 6:34pm

The tasks keep on executing even when I have killed the CLI command.

Does that mean tasks kept executing until all of them are finished?

sangcho · April 20, 2021, 6:35pm

My expectation is that when the driver exits, pending tasks shouldn’t be executed. cc @Alex Can you follow up if my understanding is correct?

asm582 · April 20, 2021, 6:35pm

Yes it kept executing even when the driver was killed

Alex · April 20, 2021, 7:04pm

@asm582 I think the issue here is that you are intending to kill the program, but you’re actually stopping it instead.

In general, ctrl+z sends a SIGSTOP to the program, which only pauses it, so the tasks remain because you could always unpause the program by running kill -SIGCONT <pid>.

I think what you’re looking for is ctrl+c which will actually kill the process and trigger the cleanup (SIGTERM).

asm582 · April 22, 2021, 10:35pm

Thank you, I now confirm with SIGTERM the tasks are killed

Topic		Replies	Views
The pending tasks/actors remain on Ray Cluster when the driver die unexpected Ray Core	13	2533	February 6, 2023
How to stop the driver jobs from Ray Cluster? Ray Clusters	4	1375	February 25, 2025
How to get ray task again while the driver submit the task died? Ray Core	2	346	December 13, 2022
Pending tasks not starting up Kubernetes	7	1465	May 13, 2022
Gracefully canceling process Ray Core	1	1161	March 22, 2022

Killing driver does not kill tasks in Ray on minikube

Related topics