Say I have a function
copy_data that copies data from an outside source (e.g., s3, gs, database) to a VM, and I want to run that function exactly once on each node in a cluster. Is there a good way to do that?
If I have, say, 5 nodes, and each node has 32 cores, I’ve been decorating
@ray.remote(num_cpus=32) and using ray to run it 5 times, but I thought there might be a better way.
If it helps, copying data is just one application for this functionality.