Schedule tasks with dependency tasks

Allie_Yang · December 10, 2022, 7:25am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

If I have 200 jobs to run, 100 A jobs and 100 B jobs, both require 1 GPU per job. each B has a corresponding pre-requisite A job to finish first. Say I have 10 GPUs in total. Is there any mechanism to make sure that the gpu resources are not idle? In the extreme case, that ray takes all 10 GPUs and assign to B, but B can’t begin since the corresponding As are not finished yet?

Assuming the way I submit job is regardless of job priorities and dependencies, just throw them all to ray cluster all together without ordering

sangcho · December 12, 2022, 8:57am

Btw, note that the job means Ray driver in Ray (Ray Jobs Overview — Ray 3.0.0.dev0). We use tasks for this term.

This should just work out of the box. Ray automatically schedules a new task if there’s the resource availability. So when one task (either A or B) is done, it should schedule the next task. Unless your task B requires multiple A to be done, your GPU should be always fully used unless there’s less than 10 tasks left from the cluster.

Allie_Yang · December 12, 2022, 9:52pm

Thanks, what if the 10 GPUs are processing 10 B tasks, whose A tasks are still in the queue. These 10 B tasks would always be in running state, right?

sangcho · December 12, 2022, 11:37pm

If I understand your question correctly, B cannot run without A right? It is like

@ray.remote(num_gpus=1)
def b(a):
    pass

@ray.remote(num_gpus=1)
def a():
    pass

all_a = [a.remote() for _ in range(100)]
all_b = []
for a_ref in all_a:
    all_b.append(b.remote(a))

this means

10 GPUs are processing 10 B tasks, whose A tasks are still in the queue

→ this is not possible in the first place? (B can only run when A is completed)

Allie_Yang · December 13, 2022, 6:17am

Similar, it is more like below:

@ray.remote(num_gpus=1)
def a():
    pass

@ray.remote(num_gpus=1)
def b():
    self.pre_requisite_task = a
    if a.is_done:
        do_something_in_b

sangcho · December 13, 2022, 7:07am

Hmm that code is not executable? I just would like to emphasize that if the reference is passed to other remote task, those tasks are not scheduled until the upstream dependency is completed. Like

a_ref = a.remote()
# In this case, b wouldn't be scheduled until a is completed
b_ref = b.remote(a_ref)

Allie_Yang · December 13, 2022, 9:04pm

Thanks, but how does ray know b is dependent on a?
If it runs b without a, it can run okay without any error (though it didn’t run the code of do_something_in_b)

Topic		Replies	Views
How do Tasks get scheduled on a Head Node with CPU=0? Ray Core	1	789	January 10, 2022
Can single task or actor (remote) run on multiple nodes? Ray Core	2	57	March 10, 2025
Use Ray to parallelize tasks	3	432	February 22, 2021
How to prevent scheduling non-GPU tasks to GPU nodes Ray Core	6	132	September 30, 2024
Ray dataset pipeline scheduling missing opportunities	3	316	August 17, 2023

Schedule tasks with dependency tasks

Related topics