How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Hi Ray community,
I am learning Ray for a parallel data processing project at university. I have a large dataframe (40GB). I want to split the cells in one column into two cells. I am connecting two nodes.
At the split stage, my computer cannot finishing processing the task, the kernel always dies at some point. I am just wondering if there is anyway that the code can be modified to utilize Ray’s parallel processing advantage?
Thank you very much!
My code is as follows:
import ray
import modin.pandas as pd
ray.init(address="10.203.81.23:6379")
df1 = pd.read_csv('./transactions_1.csv')
df2 = pd.read_csv('./transactions_2.csv')
df = df1.append(df2)
df[['date', 'time']] = df['timestamp'].str.split(' ', expand = True)