Controlling Scaling based on jobs in queue

I’m a newbie at distributed computing so I’m unsure of this approach makes sense but let’s say I have problems that create some number of jobs between a dozen and several thousands. These jobs can take a few seconds or a few minutes and each use 1 or 2 cores and sometimes a GPU.

I don’t want to schedule 1000 jobs and spin up 1000 cores, because if these jobs take a few seconds each I’m absolutely fine to have a queue of 100 jobs for just a couple of CPUs.

So my problem essentially is that even if a CPU is “busy” at the moment I send another job, that doesn’t mean the CPU won’t be free within a few seconds, and hence scaling to another node is a massive waste of resources.

Ideally I’d want to keep track of the number of jobs I’ve sent, the resources they take and a rough time estimate from my end to judge whether the backlog is acceptable or requires more resources (and hence more nodes).

Ray has a built-in autoscaler. This seems like a good use case for it; Cluster Autoscaling — Ray v1.2.0

If you have any question around it, please let me know! Basically you can specify number of nodes you want and resources per node ahead of time, and the autoscaler will ensure your clusters are not growing more than that!