Switch off autoscaler

wes · February 15, 2022, 3:01pm

Is there a way to remove/switch off the autoscaler? I have a cluster workflow that starts a head node and runs a fixed number of models. Each model is run on its own worker node. As each worker completes, I would ideally like it to be removed? In the yaml config I have tried setting:

upscaling_speed: 0.0 # I know it says not to but this seems like the logical way to do it?

I have tried setting: initial_workers, max_workers, min_workers to the exact required amount, and I have tried setting target_utilization_fraction to 1.0.

The problem is that the process running on the worker is (by design) using almost all of the cpu capacity so it triggers autoscaling, I would like to switch this off? Many thanks

ckw017 · February 15, 2022, 6:39pm

How many extra instances are you seeing (# of worker nodes - # of models running)? It sounds like the desired behavior should be possible just by setting target_utilization_fraction to 1

wes · February 17, 2022, 12:52pm

Thanks for your suggestion, though I have already tried this and it still autoscales. I have tried setting target_utilization_fraction to 1.1 but it errors with:

Failed validating 'maximum' in schema['properties']['target_utilization_fraction']:
    {'description': 'DEPRECATED. Use upscaling_speed instead.',
     'maximum': 1,
     'minimum': 0,
     'type': 'number'}

On instance['target_utilization_fraction']:
    1.1

I have also tried playing with upscaling speed to negative but it fails with:

Failed validating 'minimum' in schema['properties']['upscaling_speed']:

The code executes with either/or:

target_utilization_fraction: 1.0
upscaling_speed: 0.0

But autoscaling still takes place up to the maximum workers allowed by the cluster and it is not clear why. I have capped it at 10 workers, but 7 of these are idle because when I call:

object_ids = [f.options(name=model).remote(model) for model in models]

models is a list of only 3, the cluster should be able to accept dynamic list sizes and scale appropriately. It makes sense that its desirable to have autoscaling if cpu usage exceeds a set threshold but there should also be an option to turn this off for circumstances where full cpu usage is a design feature of the process?

p.s. I have tried setting upscaling_speed: 0.0001 which does 'kind of’work but it always starts 1 extra worker than I need which just sits idle.

ckw017 · February 17, 2022, 5:36pm

cc @Alex is there any way to get the desired behavior here? I suspect it might be possible with custom resources, but I’m wondering if there’s a simpler workaround

Ameer_Haj_Ali · February 27, 2022, 2:18pm

cc @Alex_Wu. Can you please take a look ^?

wes · February 28, 2022, 3:33pm

I have set the nested @ray.remote(num_cpus=cpus-1) as a workaround for now, so a cluster of n 16 core boxes will only use 15 cores each for nested ray processes. Though this is not ideal as I am not using full cpu capacity.

wes · March 2, 2022, 4:03pm

I notice that this actually still triggers autoscaling briefly, I really do need a proper solution to switch off the autoscaler and allow ray to natively fix the cluster size to the number of of remote function calls. If anyone can help with this it would be really appreciated? Thanks

Topic		Replies	Views
[Autoscaler] Autoscaler on ray 1.3 with minikube does not scale down Ray Clusters	2	383	June 3, 2021
Autoscaler scale down slow Ray Core	3	329	March 1, 2021
[Autoscaler] Autoscaler behavior for changes to min_workers for deployed cluster Ray Clusters	2	319	June 3, 2021
Why the KubeRay disable the autoscaling in default? Kubernetes	8	919	October 8, 2022
How to disable Autoscaler for local cluster Ray Clusters	9	631	March 16, 2023

Switch off autoscaler

Related topics