I’m new to Ray and finding it great, but there’s one giant gap in my understanding: how do I use to core Ray functionality to implement the reduce step of a MapReduce algorithm?
MapReduce is such a fundamental concurrency algorithm that I’d expect this to be really obvious. Like there would be core Ray functions called “reducer”, the Getting Started part of the documentation would have a word count example, etc. But there’s almost no mention of MapReduce either in the documentation or online. The top Google result I get for “ray mapreduce” is a link to a long-abandoned Ray design patterns page from version 1.10.1. This surprises me.
To make things concrete, here’s a simple problem. I have a bunch of number lists, each of which has a unique name A, B, C etc.
A: 10, 3, 5, 7, 19 B: 2, 4, 6, 7, 5, 100 C: 10, 9, 5, 3, 2, 1 ...
I want to transform this into lists with the same names where each number has been incremented by one, running the “increment by one” operation in parallel.
A: 11, 4, 6, 8, 20 B: 3, 5, 7, 8, 6, 101 C: 11, 10, 6, 4, 3, 2 ...
I’m confident that I can come up with a scalable map step of this process in an idiomatically Ray-like fashion, but I’m not sure how to use Ray to make the reduce step scalable. The best strategy I’ve seen is in the blog post Executing a distributed shuffle without a MapReduce system, but that uses the kind of complicated programming that I want Ray to do for me.
Am I misunderstanding something fundamental about Ray’s design?