Allreduce primitive?

Hi there,

I was wondering if there is a way that I can use allreduce in core ray? The way I picture this is something like:

# on a remote worker:
ray.allreduce(x, operation='sum', axis_name='foo', axis_size=16)
x  # now holds the sum of all x's across all 16 workers

I imagine that ray would block on the remote workers until it has collected axis_size=16 calls to allreduce along axis_name='foo'. It would then do the allreduce operation and then lift the block.

Is something like this possible? If not, do you think it’s a good idea to add this? I would love to have this kind of functionality.

1 Like

Yeah!

This is now supported with ray.util.collective. It’s still a work-in-progress, but you should definitely try it out!

cc @zhisbug

1 Like

Awesome, thanks!

I also like the design… define a group and then referencing it by name.