Ray Column With Custom Python Dataclass Type

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Can you store custom python dataclasses in ray datasets? If not, what is the recommended way to create a column with structured data?

For example:
@dataclass
class ImageMetadata:
x_resolution: float
y_resolution: float
file_name: str

@dataclass
class ImageTile:
tile_values: np.array
tile_metadata: ImageTileMetadata

I’d like to have ImageTile be a column type in my ray dataset.

you can do that, but more recommended to use numpy ndarrays.

Thanks! Is there any advice on how to make the numpy arrays more readable? For example, if I had an array of [x_resolution, y_resolution] it would be error prone to remember that the x_resolution is in position 0.

Using numpy arrays might also make it hard for me to show relationships between the data in the columns. For example, I have an image and then I apply a flat_map and create tiles it might be nice to represent that relationship (an image has tiles) in the dataset. Do you have any thoughts on how to do that in a readable way?