Hi,
I am a new user to Ray and have a problem in which I have a list of combinations that I need to process. This list can contain 10000 elements, and instead of going through 1 by 1 in a traditional for loop, I thought that I could do it in parallell. E.g., that one worker does first 100 chunks, another does the next 100 chunks etc until all chunks are covered.
I have illustrated my Python flow below. With this flow, I am not noticing any extreme speed up compared to doing it without using Ray.
I am probably utilizing Ray in the wrong way and thought that perhaps someone in this forum could point out what I could do in order to utilize Ray in a correct an efficient manner.
import ray
# Create chunks of size n
def chunk(lst, n):
for i in range(0, len(lst), n):
yield lst[i:i+n]
# The function that executes computations
def process_combos(combo, dataframe1, dataframe2):
...
res_df = some_other_func(combo, dataframe1, dataframe2)
...
temp_df = ...
return temp_df
@ray.remote
def run_chunks(chunks, dataframe1, dataframe2):
return [process_combos(chunk, dataframe1, dataframe2) for chunk in chunks]
ray.init()
# all the combinations that I have to process (list might contain 10000 elements)
all_combos = [(1,2), (3,2), (1,9), ...]
chunk_size = 100
chunked_list = chunk(all_combos, chunk_size)
output = ray.get([run_chunks.remote(chunks, dataframe1, dataframe2) for chunks in chunked_list])