Communication cost of job scheduling

ang · January 16, 2021, 6:32am

Hi, say I want to dispatch tasks to multiple slurm nodes. Is there any difference between defining the atomic task and defining grouped atomic tasks regarding the communication cost of job scheduling?

1. define the atomic task

@ray.remote(num_cpus=1)
def atomic_task(args):
    some_heavy_work(args)

2. define grouped atomic tasks

@remote(num_cpus=1)
def grouped_atomic_tasks(list_of_args):
    for args in list_of_args:
        some_heavy_work(args)

list_of_args is a chunk of args manually grouped by the user for each worker.

sangcho · January 22, 2021, 2:55am

There’s definitely overhead of task scheduling though it won’t be too big (it should be at max order of milliseconds even in a large cluster). But if you are doing heavy work, it won’t likely to be a problem.

ang · January 22, 2021, 3:10am

Thanks! How about the cost of putting objects into the object store and getting them back by each worker? Is there a significant difference between these two ways of defining jobs?

sangcho · January 22, 2021, 4:07am

They should be really fast if you are using objects that support zero-copy serialization (e.g., numpy) because when worker accesses them, there’s no serialization cost.

If zero-copy is not supported, there’s still a serialization cost, but the put & get itself’s overhead shouldn’t be significant if your object is big enough.

Topic		Replies	Views
Processing performance of tasks Ray Core	14	690	March 8, 2021
Tasks become slow when num of submitted task greater than num cpus Ray Core	1	265	November 23, 2021
Ray on single machine. No threading? Ray Core	10	1641	April 2, 2021
All remote tasks are scheduled to one node Ray Clusters	0	387	August 28, 2021
Performance overhead numbers Ray Core	9	493	December 8, 2021

Communication cost of job scheduling

Related Topics