How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have created parquet files with tensor data.
When I tried to read them as ray dataset by providing the argument ‘tensor_column_schema’ as below
ds = ray.data.read_parquet(path,tensor_column_schema={“data”: (np.float32, (2, 2, 3))})
I get this error:
TypeError: from_fragment() got an unexpected keyword argument ‘tensor_column_schema’
I also tried with dataset_kwargs as below
ds = ray.data.read_parquet(path,dataset_kwargs={‘tensor_column_schema’:{“data”: (np.float32, (2, 2, 3))}})
which gives this error:
File “/opt/conda/lib/python3.8/site-packages/ray/data/datasource/parquet_datasource.py”, line 61, in prepare_read
pq_ds = pq.ParquetDataset(
TypeError: new() got an unexpected keyword argument ‘tensor_column_schema’
Versions / Dependencies
Ray installed through pip and version is 1.9.0
Reproduction script
import ray
import os, glob
import numpy as np
import pandas as pd
arr = np.arange(24).reshape((3, 2, 2, 2))
df = pd.DataFrame({
“one”: [1, 2, 3],
“two”: [tensor.tobytes() for tensor in arr]})
ds = ray.data.from_pandas([df])
ds.write_parquet(spath)
ds = ray.data.read_parquet(
spath, tensor_column_schema={“two”: (np.int8, (2, 2, 2))})
print(ds.schema())
sample-snippet grabbed from here
https://docs.ray.io/en/latest/data/dataset-tensor-support.html#:~:text=Cast%20from%20data%20stored%20in%20C-contiguous%20format%3A