How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
I have a TB-sized tabular data set that is going through these preprocessing functions:
preprocessor = Chain(Concatenator(include=cat_cols, dtype=np.int32, output_column_name=['x_cat']),
Concatenator(exclude=["y", "x_cat"], dtype=np.float32, output_column_name=['x_cont'])
)
My cluster is a single node with 40 CPUs, 672 GB of RAM and 8x GPUs.
I cannot get Ray to use all the CPUs when processing the data. Fiddling with the object store memory, I observe the following use of CPUs:
# 10 gb -- 1 / 40 cpu
# default object store memory -- 9 / 40 cpu
# 610 gb eq to shm -- 27 / 40 cpu
How can I get Ray to use all CPUs while preprocessing data?