Execute in parallel non parallelizable tasks

acosta · February 29, 2024, 7:44am

I have a data preprocessing code that consists in a list of tasks that are not parallelizable tasks.

Is it possible to run them in parallel on different workers?

I tried the following but it does not seem to work. I see only one worker active.

import ray
import pandas as pd

df = pd.csv_read('file.csv')

ray.init(num_cpus=1)

df_ref  = ray.put(df)

@ray.remote
def pow2(data):
    pows = []
    for i in range(len(data)):
        pows.append(data.iloc[i, 'values']**2)
return daily_avg

@ray.remote
def pow3(data):
    pows = []
    for i in range(len(data)):
        pows.append(data.iloc[i, 'values']**3)
return daily_avg

result_pow2 = pow2.remote(df_ref)
result_pow3 = pow3.remote(df_ref)

pow2_result, pow3_result = ray.get([result_pow2, result_pow3])

df["pow2"] = pow2_result
df["pow3"] = pow3_result

print(df)

Note: the tasks in the example may easily be parallelized… Please, just consider the workflow.

I also would like to avoid rewriting the code to make a batching approach work as I need to quickly test different things…

jjyao · March 17, 2024, 5:06am

You started a cluster with only 1 cpu so only one task (requires 1 cpu) can run at a single point of time.

Topic		Replies	Views
Why just single process work? Ray Core	1	340	July 28, 2021
How to effectively use parallelization with Ray in Python? Ray Core	5	4042	December 2, 2022
Parallelising loops using ray Core Ray Core	1	235	October 12, 2023
Ray not utilized efficiently Ray Core	3	347	March 15, 2022
Ray on single machine. No threading? Ray Core	10	2142	April 2, 2021

Execute in parallel non parallelizable tasks

Related topics