I need some advice on how I can structure data exchange between workers.
My project has one task that creates lots of coordinates and a second task that evaluates a function (and does some other stuff) with the coordinate and returns the results to some owner that processes the results.
The time both tasks take will vary over the execution of the program.
I want to execute both tasks at the same time so that coordinates are produced, put in some place and whenever task two is free it can access a new coordinate to evaluate.
What would be the best data structure to share data between the two tasks so they can work independently and in parallel? How do I make it accessible to both tasks?
Is this possible with ray or do I have to use some other framework/library?
To perform this in a streaming manner, you can use the dataset pipelining feature:
pipe = ray.data.range(num_coords).pipeline(parallelism=10)
pipe = pipe.map(get_coord)
pipe = pipe.map(evaluate_coord)