Speeding up ray scheduler

hpetrusev · December 7, 2021, 4:37am

Hello everybody!

I have a simple question regarding the ray scheduler.

Little bit of context, my workflow is kind of many fine grained tasks, generally this number can go as high as 500-600 or 1000 and ideally even more. I host the ray cluster on Kubernetes, and my autoscaler uses AWS Karpenter to request for resources, and this part is super fast I go from 1 CPU to 200 or 300 or more CPUs very fast. However, once my worker nodes are running, my ray scheduler seems to be trailing behind, it needs quite a bit of time to fill up the available CPUs with tasks, although there are plenty of tasks available. The problem with this is that in impedes further scaling up, and ideally I would like a lot tasks to be running concurrently, faster than my current solution.

Main thing that is my pain point is speeding up the ray scheduler towards utilizing more of the available resources, faster. I cannot find anything such as maybe some way of allocating more resources to the scheduler? I couldn’t quite find something like this on the forums and/or documentation. Would be happy to hear how people solved similar issues.

Topic		Replies	Views
Processing performance of tasks Ray Core	14	805	March 8, 2021
Autoscaling is very slow and not working correctly Ray Clusters	6	605	April 30, 2021
Little speed up from 100 to 300 cores Ray Core	4	386	July 5, 2022
Is there a way to limit resources used by a ray job? Kubernetes	0	168	January 15, 2024
Troubleshooting Slow Task Execution in Ray Clusters Dashboard, Monitoring & Debugging	1	65	December 27, 2024

Speeding up ray scheduler

Related topics