Hi all, my use case is when I use Ray with jupyter-notebook interactively.
Let’s say I have submitted a lot of tasks to the remote cluster, and I found that my code is wrong so I interrupt the notebook to stop my current execution. After that, I found that the worker nodes were still working on the tasks that I had submitted.
Are there any way to clear the pending tasks or jobs? Or just kill individual tasks?
One of the workarounds I did was that wrapping the function with try-except, catching the keyboard interrupt and calling ray. cancel(task_ref)
, but I think it’s not a graceful approach.
@Kelvinyu1117
ray.cancel(object_ref, force=False, recursive=False)
, which I believe you are using it.
The object_ref
parameter is the reference to the task you want to cancel. The force
parameter, if set to True
, will cause the task to be immediately cancelled. If recursive
is set to True
, it will also cancel any tasks that the current task has called.
Were you asking if there is a CLI way to kill tasks? ray summary tasks
gives you the current status of the tasks, not an ability to kill the tasks
cc: @rickyyx
Thank you for your reply. The reason I’m asking for this is because I’m wondering how I can handle a job hanging or stuck, I expect there is some mechanism for the user to force kill the task when the task is stuck.