I have a data preprocessing code that consists in a list of tasks that are not parallelizable tasks.
Is it possible to run them in parallel on different workers?
I tried the following but it does not seem to work. I see only one worker active.
import ray
import pandas as pd
df = pd.csv_read('file.csv')
ray.init(num_cpus=1)
df_ref = ray.put(df)
@ray.remote
def pow2(data):
pows = []
for i in range(len(data)):
pows.append(data.iloc[i, 'values']**2)
return daily_avg
@ray.remote
def pow3(data):
pows = []
for i in range(len(data)):
pows.append(data.iloc[i, 'values']**3)
return daily_avg
result_pow2 = pow2.remote(df_ref)
result_pow3 = pow3.remote(df_ref)
pow2_result, pow3_result = ray.get([result_pow2, result_pow3])
df["pow2"] = pow2_result
df["pow3"] = pow3_result
print(df)
Note: the tasks in the example may easily be parallelizedā¦ Please, just consider the workflow.
I also would like to avoid rewriting the code to make a batching approach work as I need to quickly test different thingsā¦