Data exchange between workers

Hi,

I need some advice on how I can structure data exchange between workers.
My project has one task that creates lots of coordinates and a second task that evaluates a function (and does some other stuff) with the coordinate and returns the results to some owner that processes the results.
The time both tasks take will vary over the execution of the program.
I want to execute both tasks at the same time so that coordinates are produced, put in some place and whenever task two is free it can access a new coordinate to evaluate.
What would be the best data structure to share data between the two tasks so they can work independently and in parallel? How do I make it accessible to both tasks?
Is this possible with ray or do I have to use some other framework/library?

Thank you very much.

Would you want this to be done in a streaming manner? or the evaluate will always be executed “after” producing a batch of coordinates is done?

Ideally, this would happen in an streaming manner, otherwise the waiting time for a batch might become very long.

Is this something datasets can implement? Datasets: Distributed Arrow on Ray — Ray v2.0.0.dev0

For example (not sure how coordinates are produced, but say we have a function to compute the ith coordinate get_coord(i)):

ds = ray.data.range(num_coords)
ds = ds.map(get_coord)
ds = ds.map(evaluate_coord)

To perform this in a streaming manner, you can use the dataset pipelining feature:
pipe = ray.data.range(num_coords).pipeline(parallelism=10)
pipe = pipe.map(get_coord)
pipe = pipe.map(evaluate_coord)