I am a new user to Ray and have a problem in which I have a list of combinations that I need to process. This list can contain 10000 elements, and instead of going through 1 by 1 in a traditional for loop, I thought that I could do it in parallell. E.g., that one worker does first 100 chunks, another does the next 100 chunks etc until all chunks are covered.
I have illustrated my Python flow below. With this flow, I am not noticing any extreme speed up compared to doing it without using Ray.
I am probably utilizing Ray in the wrong way and thought that perhaps someone in this forum could point out what I could do in order to utilize Ray in a correct an efficient manner.
import ray # Create chunks of size n def chunk(lst, n): for i in range(0, len(lst), n): yield lst[i:i+n] # The function that executes computations def process_combos(combo, dataframe1, dataframe2): ... res_df = some_other_func(combo, dataframe1, dataframe2) ... temp_df = ... return temp_df @ray.remote def run_chunks(chunks, dataframe1, dataframe2): return [process_combos(chunk, dataframe1, dataframe2) for chunk in chunks] ray.init() # all the combinations that I have to process (list might contain 10000 elements) all_combos = [(1,2), (3,2), (1,9), ...] chunk_size = 100 chunked_list = chunk(all_combos, chunk_size) output = ray.get([run_chunks.remote(chunks, dataframe1, dataframe2) for chunks in chunked_list])