Gracefully stopping ray jobs

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

When the ray job stop is called, it looks like a sigterm signal is sent to the driver, but setting a SIGTERM handler in the ray driver script does not seem to work. What is the recommended way to gracefully shut down the job and do some cleanup when ray job stop is called?

If you want to perform cleanup for Train Jobs on stop, you can override the Trainable.cleanup() method in your Trainable class.

For Serve, Ray also provides a ray.shutdown() function to terminate processes started by ray.init(), and ray.serve.shutdown() to completely shut down Serve on the cluster.

I realized that i made a mistake - the sigterm handler was not being registered at the right place. Handling of sigterm works as expected.

@saswatac where did you end up setting the sigterm handler to make it work? I’m having some trouble handling a sigint (which is sent by running ray job stop and using env variable RAY_JOB_STOP_SIGNAL: SIGINT). Putting it in my ray driver script isnt working even though I tried disabling ray tune’s sigint handler (TUNE_DISABLE_SIGINT_HANDLER: “1”)