Gracefully canceling process

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi,

I have a problem when canceling tasks.

Description
I have some tasks that I want to run and then cancel in case they run too long (canceling is done dynamically up indication of another task). The tasks them self call subprocess.Popen() to execute some external script. When canceling the tasks with ray cancel, the tasks only get canceled after the subprocess in done instead of also canceling the subprocess. I.e the keyboard interrupt only gets invoked after the task that should be killed is done. I thought about working with signals and then killing the subprocess up on that but somehow I can not catch the ray cancel signal within the task. Below I provided an example. I exchanged the subprocess through a sleep since I am experiencing the same behavior in this case.

Code

import ray
import time
import random
import math

@ray.remote(num_cpus=1)
def long_process(sleep):
    try:
        print(f"Starting {time.ctime()}, {sleep}")
        time.sleep(sleep)
        print(f"Finished {time.ctime()}, {sleep}")
    except KeyboardInterrupt:
        print(f"Interrupted {time.ctime()}, {sleep}")

@ray.remote(num_cpus=1)
def monitor(sleep, tasks_kill):
    time.sleep(sleep)
    [ray.cancel(t) for t in tasks_kill ]
    print("Killing processes at:", time.ctime())
    return  sleep

rtime = [10, 15]
long_tasks = [long_process.remote(rt) for rt in rtime]
monitor = monitor.remote(5,long_tasks)

run = long_tasks + [monitor]
ray.get(run)

Output:

(long_process pid=85820) Starting Sun Mar 20 12:28:17 2022, 10
(long_process pid=85821) Starting Sun Mar 20 12:28:17 2022, 15
(monitor pid=85819) Killing processes at: Sun Mar 20 12:28:22 2022
(long_process pid=85820) Interrupted Sun Mar 20 12:28:27 2022, 10
(long_process pid=85821) Interrupted Sun Mar 20 12:28:32 2022, 15

Problem:
The sleep should be interrupted (canceled) and the process end immediately. As you can see the keyboard interrupted is only handled after the sleep finishes. Is there someway to catch the keyboard interrupt and then gracefully shut the subprocess (sleep) down and return within the long process?

Many thanks

I think this is because Python won’t deliver the KeyboardInterrupt until after the sleep is done. You could try either breaking the long_process task into smaller chunks that each check for the KeyboardInterrupt signal, or you could use ray.cancel(force=True) to kill the long_process worker processes completely.