How to determine time of latest task run

I’m hoping to implement an auto-timeout mechanism to shut down the head after a specified period of inactivity.
Is there a way to determine either when the most recently executed task was run on a worker?
Or, how many tasks were run on the workers since the beginning?

Hi @varga, you could implement this at the application layer, by having an actor hold a timer, each task pings the actor letting it know that it’s doing work which resets the timer, and have the actor shutdown the cluster if the timer expires. If you don’t need to support multiple workloads supported from multiple drivers/clients, you could also have this logic in the driver. However, I don’t think that there’s a built-in way to do this using introspection into Ray task state.

cc @sangcho is there any way to do this without such an application-level solution?

Maybe cc @Alex for more ideas. (maybe there’s a way to achieve it using autoscaler? )

I don’t think this is achievable via the autoscaler, but if you’re willing to implement a bunch of custom logic, I believe you could look at prometheus metrics to see if the raylet has executed any tasks recently. For the head node, I think you would have to implement that functionality yourself though.

The autoscaler can terminate worker nodes (but not the head) after an idle period.