- Low: It annoys or frustrates me for a moment.
Even though it’s marked as Low priority the overall impact is huge when looking into the costs.
As the title suggests, I am looking for an Auto Termination feature where a Ray Cluster shuts down after ‘N’ minutes of no activity.
Although the cluster autoscaler works, it does not solves these two issues:
- The head node continues to run and its a bigger machine (4xlarge or higher) for our work loads
- Some workloads that require GPUs must maintain a set of nodes thereby requiring min, max to be same for the instance count. In this scenario, the autoscaler will not scale it down even when the tasks is complete.
I am curious to know if this is supported or is on the roadmap or is there a suggested solution lying somewhere.
Thanks for reading,