Ray.data read_parquet ‘tensor_column_schema’ argument issue

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have created parquet files with tensor data.

When I tried to read them as ray dataset by providing the argument ‘tensor_column_schema’ as below

ds = ray.data.read_parquet(path,tensor_column_schema={“data”: (np.float32, (2, 2, 3))})

I get this error:
TypeError: from_fragment() got an unexpected keyword argument ‘tensor_column_schema’

I also tried with dataset_kwargs as below

ds = ray.data.read_parquet(path,dataset_kwargs={‘tensor_column_schema’:{“data”: (np.float32, (2, 2, 3))}})

which gives this error:

File “/opt/conda/lib/python3.8/site-packages/ray/data/datasource/parquet_datasource.py”, line 61, in prepare_read
pq_ds = pq.ParquetDataset(
TypeError: new() got an unexpected keyword argument ‘tensor_column_schema’

Versions / Dependencies

Ray installed through pip and version is 1.9.0

Reproduction script

import ray
import os, glob
import numpy as np
import pandas as pd

arr = np.arange(24).reshape((3, 2, 2, 2))
df = pd.DataFrame({
“one”: [1, 2, 3],
“two”: [tensor.tobytes() for tensor in arr]})

ds = ray.data.from_pandas([df])
ds.write_parquet(spath)

ds = ray.data.read_parquet(
spath, tensor_column_schema={“two”: (np.int8, (2, 2, 2))})

print(ds.schema())

sample-snippet grabbed from here
https://docs.ray.io/en/latest/data/dataset-tensor-support.html#:~:text=Cast%20from%20data%20stored%20in%20C-contiguous%20format%3A

Updating ray to 2.2.0 fixed the issue, 1.9.0 doesn’t support tensor_column_schema argument