The mean function in the ray is giving ambiguous performance when calculating numpy array mean
import numpy as np
# Create a random Ray dataset.
dataset = ray.data.from_items([
{"vector": np.random.rand(10)} for _ in range(1000)
])
For above dataset , mean calculation by the following action will
dataset.mean(on="vector")
What is the cause of this issue?
give error
pyarrow.lib.ArrowNotImplementedError: Function ‘sum’ has no kernel matching input types (extension<ray.data.arrow_tensor>)
However, the following action will execute without an error.
dataset.select_columns(["vector"]).mean()