Controlling Scaling based on jobs in queue

alexis-drakopoulos · April 8, 2021, 10:03am

I’m a newbie at distributed computing so I’m unsure of this approach makes sense but let’s say I have problems that create some number of jobs between a dozen and several thousands. These jobs can take a few seconds or a few minutes and each use 1 or 2 cores and sometimes a GPU.

I don’t want to schedule 1000 jobs and spin up 1000 cores, because if these jobs take a few seconds each I’m absolutely fine to have a queue of 100 jobs for just a couple of CPUs.

So my problem essentially is that even if a CPU is “busy” at the moment I send another job, that doesn’t mean the CPU won’t be free within a few seconds, and hence scaling to another node is a massive waste of resources.

Ideally I’d want to keep track of the number of jobs I’ve sent, the resources they take and a rough time estimate from my end to judge whether the backlog is acceptable or requires more resources (and hence more nodes).

sangcho · April 8, 2021, 6:31pm

Ray has a built-in autoscaler. This seems like a good use case for it; Cluster Autoscaling — Ray v1.2.0

sangcho · April 8, 2021, 9:03pm

If you have any question around it, please let me know! Basically you can specify number of nodes you want and resources per node ahead of time, and the autoscaler will ensure your clusters are not growing more than that!

Topic		Replies	Views
Autoscaling behavior Ray Clusters	1	336	September 13, 2021
Ray Serve Autoscaling: Autoscaling backend-replicas removed? Ray Serve	3	493	February 18, 2021
Autoscaling Replicas in Ray Serve Ray Serve	5	1695	March 12, 2021
How does the scheduler "decide" to send jobs based on resources/plasma storage? Ray Core	1	507	April 5, 2021
Autoscaling not working with ray.util.multiprocessing Kubernetes	5	775	June 17, 2021

Controlling Scaling based on jobs in queue

Related topics