Making Ray scheduler to Pack the workloads

Erkin · April 5, 2024, 10:14am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Using Ray Serve for serving LLMs per serve deployment each of which may require a different GPU quantity (8, 4 or 1 GPUs), when the small models are spread across of the nodes, deployments of big models cannot find a node for themselves with enough number of GPUs (we don’t use distributed inference because it was observed to affect the performance with the network overhead).

So is there a way to tweak the Ray scheduling so that it will try to schedule new deployments (or actors) on the nodes that are already being used in order to prioritize the utilization of the nodes first rather than spreading them across the nodes? (I asked the question on Ray Core topic since this is about the scheduler itself)

Topic		Replies	Views
Can we make ray evenly schedule tasks on different GPUs? Ray Core	3	307	January 11, 2021
Gpu allocation for ray serve on multi gpu environment Ray Serve	5	252	November 18, 2024
Scaling Ray serve with vLLM beyond 2 GPUs Ray Serve	1	2331	February 5, 2024
Ray Serve - Setting num_replicas > 1 errors out and not using GPU Ray Serve	5	975	January 13, 2022
Has anyone tried Ray Serve with NVIDIA MPS Ray Serve	1	693	March 13, 2024

Making Ray scheduler to Pack the workloads

Related topics