How does ray pick which job to run when there are more jobs than available resources

Soichi_Hayashi · December 13, 2022, 8:46pm

Hello. I’ve been trying to find information about how Ray picks the “next” job to execute when there are multiple jobs in the queue that are submitted by multiple workflows.

For example, Let’s say I submit a workflow with 1000 jobs, and while those 1000 jobs are being executed I submit another 100 jobs. It looks like Ray will not wait for the first 1000 jobs to finish but it starts executing some of the 100 jobs that I’ve submitted concurrently. Does Ray pick next job randomly from the queue? If it picks jobs randomly, is there way to “penalize” the priorities of the first 1000 jobs so that other jobs will be more likely to be processed? Can I configure this behavior somehow via configuration files?

ClarenceNg · December 15, 2022, 12:45am

There isn’t a way to specifically configure priority, but it is possible to wait for the first batch to complete before submitting the second. Once it is submitted, there is not a way to penalize and prioritize for what is later submitted.

import ray

@ray.remote
def one():
    pass

@ray.remote
def two():
    pass


ones = [one.remote() for i in range(3)]

ray.get(ones)
print("finished ones")

twos = [two.remote() for i in range(3)]

ray.get(twos)
print("finished twos")

Soichi_Hayashi · December 15, 2022, 1:22am

Thanks for the tip! @ClarenceNg

If I don’t wait for the first batch to complete and submit the second batch, how does ray handle the ordering of job execution? Will ray start processing the second batch before completing all first jobs?

ClarenceNg · December 15, 2022, 3:21pm

There are factors at play that affects when a task completes, like the variance in the processing time, whether task is retried, etc. Even in the perfect scenario we don’t guarantee specific ordering of execution once the task is submitted. It is possible that they are executed in FIFO but that is not a guarantee that is provided. If the tasks in the two batches are identical it is very possible that it executes in FIFO order.

To wait for completion the best bet is to look into functions like ray.get or ray.wait

Soichi_Hayashi · December 30, 2022, 3:14pm

@ClarenceNg I actually don’t want FIFO. What I want is for ray to provide “fair execution” of multiple workflows submitted more/less at the same time. For example, I am using ray as a backend to run workflows submitted from a multi-user web application. When multiple users submit workflows simultaneously, all requests will be more/less submitted concurrently to a single ray cluster. I want all workflows to be executed concurrently without a single user taking over the whole cluster for extended period of time. I definitely don’t want FIFO in this case as that means all users will have to wait for other users’ jobs to finish - especially if first user submits a large job and other users jobs are much smaller.

Am I correct to assume that ray itself does not provide such “fair execution” execution policy? I am familiar with using batch schedulers like slurm, htcondor that handles multi-user environment, but I am wondering if there is anything I can use on-top-of ray itself to make such “fair execution” possible. Has anyone worked on a layer on top of lay to make that happen?? Ideally, I’d like to be able to configure ray’s internal scheduling system, but is ray strictly designed for a single user environment?

Topic		Replies	Views
Ray submission queue Kubernetes	1	791	April 7, 2021
Queue difference between JobSubmission vs normal ray job from ray init Ray Core	2	491	December 8, 2022
Schedule tasks with dependency tasks Ray Core	6	814	December 13, 2022
Question: Task Scheduling and prioritization Ray Core	1	111	November 2, 2024
Is ray serve queue FIFO?	0	20	November 8, 2024

How does ray pick which job to run when there are more jobs than available resources

Related topics