Gracefully stopping ray jobs

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

When the ray job stop is called, it looks like a sigterm signal is sent to the driver, but setting a SIGTERM handler in the ray driver script does not seem to work. What is the recommended way to gracefully shut down the job and do some cleanup when ray job stop is called?

If you want to perform cleanup for Train Jobs on stop, you can override the Trainable.cleanup() method in your Trainable class.

For Serve, Ray also provides a ray.shutdown() function to terminate processes started by ray.init(), and ray.serve.shutdown() to completely shut down Serve on the cluster.

I realized that i made a mistake - the sigterm handler was not being registered at the right place. Handling of sigterm works as expected.