Ray.data read_parquet ‘tensor_column_schema’ argument issue

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I have created parquet files with tensor data.

When I tried to read them as ray dataset by providing the argument ‘tensor_column_schema’ as below

ds = ray.data.read_parquet(path,tensor_column_schema={“data”: (np.float32, (2, 2, 3))})

I get this error:
TypeError: from_fragment() got an unexpected keyword argument ‘tensor_column_schema’

I also tried with dataset_kwargs as below

ds = ray.data.read_parquet(path,dataset_kwargs={‘tensor_column_schema’:{“data”: (np.float32, (2, 2, 3))}})

which gives this error:

File “/opt/conda/lib/python3.8/site-packages/ray/data/datasource/parquet_datasource.py”, line 61, in prepare_read
pq_ds = pq.ParquetDataset(
TypeError: new() got an unexpected keyword argument ‘tensor_column_schema’

Versions / Dependencies

Ray installed through pip and version is 1.9.0

Reproduction script

import ray
import os, glob
import numpy as np
import pandas as pd

arr = np.arange(24).reshape((3, 2, 2, 2))
df = pd.DataFrame({
“one”: [1, 2, 3],
“two”: [tensor.tobytes() for tensor in arr]})

ds = ray.data.from_pandas([df])

ds = ray.data.read_parquet(
spath, tensor_column_schema={“two”: (np.int8, (2, 2, 2))})


sample-snippet grabbed from here

Updating ray to 2.2.0 fixed the issue, 1.9.0 doesn’t support tensor_column_schema argument