On premise cluster: Waking servers dynamically

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi everyone,
I’m having some doubts about using ray with an on premise cluster.
I’m currently using ray with autoscaler and it’s functioning properly. However, to conserve electrical energy, and increase the longevity of the servers, they are sleeping most of the time.
Wake on LAN is set up, and I’m manually waking the servers before starting ray, and putting them to sleep after the jobs are done.

I was wondering if there was support for pre-start and post-stop hooks, to do everything automatically.

I have considered using a cluster manager (specifically slurm), but ray+slurm doesn’t seem to support using autoscaler and waking the servers based on the load.

Is there a recommended way of doing this, and is there something I’m missing?

Thanks!