Increase tune's concurrent trials to trigger Autoscaling

paparara · October 10, 2022, 2:43pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I’m running ray.tune trials on a ray cluster. My problem is that ray.tune only runs “the number of CPUs” concurrent trials on the head node. How could I increase the number of concurrent trials manually to trigger cluster’s autoscale?

-------- PS -------
I understand that I can specify CPU resources for each trial. However, in my case each trial only needs 1 CPU. I only want to increase the number of concurrent trials.

kai · October 26, 2022, 6:28pm

We’ve discussed this on slack before, but I’m pasting my answer here for future reference:

The default for this setting depends on the search algorithm - but autoscaling should still be triggered always if you request more trials than can fit on the current cluster and if the cluster is configured for autoscaling.If you want to increase autoscaling speed, you can try to adjust the TUNE_MAX_PENDING_TRIALS_PG environment variable.

cosnicolaou · March 31, 2023, 10:50pm

FWIW I’m seeing the same behaviour - ie. my head node has 8 CPUS, my four workers have 64 CPUS each, and 8 GPUs. Each trial needs half a GPU and ray.tune will never schedule more than 8 concurrent trials. I can get it to use all 4 nodes by setting TUNE_MAX_PENDING_TRIALS_PG to a large number. However, when I look at the code in execution/trial_runnery.py I see that it determines the max number of pending trials by calling ray.cluster_resources().get(“CPU”, 1.0)), which for my cluster returns 264. So I’m confused as to how it’s getting set to 8. Any suggestions?

Topic		Replies	Views
Why is my autoscaling cluster not scaling up to max when tuning? Ray Tune	1	14	March 31, 2025
How do I ask Ray to autoscale the resources for tuning? Ray Tune	7	408	March 9, 2021
Training trials in parallel on multi-gpu machine Ray Tune	8	1691	August 23, 2021
Specifying overall maximum number of cores to be used in RayTune RLlib	1	769	June 7, 2023
Multiple trials, Tune, and the Autoscaler Ray Tune	2	310	March 3, 2021

Increase tune's concurrent trials to trigger Autoscaling

Related topics