Hi, say I want to dispatch tasks to multiple slurm nodes. Is there any difference between defining the atomic task and defining grouped atomic tasks regarding the communication cost of job scheduling?
There’s definitely overhead of task scheduling though it won’t be too big (it should be at max order of milliseconds even in a large cluster). But if you are doing heavy work, it won’t likely to be a problem.
Thanks! How about the cost of putting objects into the object store and getting them back by each worker? Is there a significant difference between these two ways of defining jobs?
They should be really fast if you are using objects that support zero-copy serialization (e.g., numpy) because when worker accesses them, there’s no serialization cost.
If zero-copy is not supported, there’s still a serialization cost, but the put & get itself’s overhead shouldn’t be significant if your object is big enough.