Hi all, my use case is when I use Ray with jupyter-notebook interactively.
Let’s say I have submitted a lot of tasks to the remote cluster, and I found that my code is wrong so I interrupt the notebook to stop my current execution. After that, I found that the worker nodes were still working on the tasks that I had submitted.
Are there any way to clear the pending tasks or jobs? Or just kill individual tasks?
One of the workarounds I did was that wrapping the function with try-except, catching the keyboard interrupt and calling
ray. cancel(task_ref), but I think it’s not a graceful approach.
ray.cancel(object_ref, force=False, recursive=False), which I believe you are using it.
object_ref parameter is the reference to the task you want to cancel. The
force parameter, if set to
True, will cause the task to be immediately cancelled. If
recursive is set to
True, it will also cancel any tasks that the current task has called.
Were you asking if there is a CLI way to kill tasks?
ray summary tasks gives you the current status of the tasks, not an ability to kill the tasks
Thank you for your reply. The reason I’m asking for this is because I’m wondering how I can handle a job hanging or stuck, I expect there is some mechanism for the user to force kill the task when the task is stuck.