How do I implement a reducer in Ray?

wpm · April 15, 2023, 9:42pm

I’m new to Ray and finding it great, but there’s one giant gap in my understanding: how do I use to core Ray functionality to implement the reduce step of a MapReduce algorithm?

MapReduce is such a fundamental concurrency algorithm that I’d expect this to be really obvious. Like there would be core Ray functions called “reducer”, the Getting Started part of the documentation would have a word count example, etc. But there’s almost no mention of MapReduce either in the documentation or online. The top Google result I get for “ray mapreduce” is a link to a long-abandoned Ray design patterns page from version 1.10.1. This surprises me.

To make things concrete, here’s a simple problem. I have a bunch of number lists, each of which has a unique name A, B, C etc.

A: 10, 3, 5, 7, 19
B: 2, 4, 6, 7, 5, 100
C: 10, 9, 5, 3, 2, 1
...

I want to transform this into lists with the same names where each number has been incremented by one, running the “increment by one” operation in parallel.

A: 11, 4, 6, 8, 20
B: 3, 5, 7, 8, 6, 101
C: 11, 10, 6, 4, 3, 2
...

I’m confident that I can come up with a scalable map step of this process in an idiomatically Ray-like fashion, but I’m not sure how to use Ray to make the reduce step scalable. The best strategy I’ve seen is in the blog post Executing a distributed shuffle without a MapReduce system, but that uses the kind of complicated programming that I want Ray to do for me.

Am I misunderstanding something fundamental about Ray’s design?

Topic		Replies	Views
Allreduce primitive? Ray Core	2	573	March 5, 2021
Ray.util.collective.allreduce Ray Core	2	396	August 20, 2022
Parallelising loops using ray Core Ray Core	1	236	October 12, 2023
Reproducing the result in Ray Paper Ray Core	5	319	August 25, 2022
Using ray for data processing	2	687	January 6, 2023

How do I implement a reducer in Ray?

Related topics