Auto Termination feature

ckapoor · June 29, 2023, 6:53pm

Low: It annoys or frustrates me for a moment.

Even though it’s marked as Low priority the overall impact is huge when looking into the costs.

As the title suggests, I am looking for an Auto Termination feature where a Ray Cluster shuts down after ‘N’ minutes of no activity.

Although the cluster autoscaler works, it does not solves these two issues:

The head node continues to run and its a bigger machine (4xlarge or higher) for our work loads
Some workloads that require GPUs must maintain a set of nodes thereby requiring min, max to be same for the instance count. In this scenario, the autoscaler will not scale it down even when the tasks is complete.

I am curious to know if this is supported or is on the roadmap or is there a suggested solution lying somewhere.

Thanks for reading,
Charu

Jeff_Lutz · June 6, 2024, 4:06pm

Charu. So there is no way to have ray.io actually terminate EC2 instances when running ray cli:

ray down

?

Jeff

Sam_Chan · June 6, 2024, 4:37pm

No - management of the underlying Compute substrate itself is up to you. You can use Kuberenetes a la KubeRay, run your own, or subscribe to the various managed Ray solutions Anyscale also offers

Sam_Chan · June 6, 2024, 4:38pm

Re-reading - actually idle termination is one of the top used features there. See this: Overview of Anyscale Clusters | Anyscale Docs

Topic		Replies	Views
Autoscaler scale down slow Ray Core	3	330	March 1, 2021
Ray Cluster Not Scaling Down	7	768	May 4, 2023
Is there a way to stop or delete the head node once the job is done? Ray Clusters	5	2107	June 15, 2022
How to disable Autoscaler for local cluster Ray Clusters	9	676	March 16, 2023
Autoscaler not shutting down idle nodes. ray 1.3 Ray Clusters	20	1342	June 9, 2021

Auto Termination feature

Related topics