Ray dataset with multiple images per batch

ybrandt · September 1, 2023, 5:31pm

Hi,

I’d like to convert a Pytorch dataset to a Ray dataset.
Every batch has three keys: id (int), image:ndarray, mask:ndarray
However, I can’t find an out-of-the-box solution from Ray data to read two images from disk.

Currently, I see 2 Options:

Read a CSV with the ids and use the transform step to read the images
Build a custom Datasource, that reads two images from disk

Am I missing something?

Cheers,
Yannick

amogkam · September 1, 2023, 5:42pm

You can use the read_images API? Working with Images — Ray 2.6.1

ybrandt · September 1, 2023, 5:45pm

Thx for the quick reply, but wouldn’t this just give me a dataset with either image or mask?

amogkam · September 1, 2023, 9:55pm

How is your data stored?

ybrandt · September 1, 2023, 10:02pm

data
____images
_________1.tiff
_________2.tiff
____masks
_________1.tiff
_________2.tiff

amogkam · September 1, 2023, 10:36pm

Got it. Then I would go with your first approach.

Topic		Replies	Views
Ray dataset cannot read and parse image image dataset from S3	12	946	August 14, 2023
Can Ray Dataset be used between S3 and PyTorch? Ray Data	4	1152	February 17, 2022
How to convert Pytorch torch.utils.data.Dataset to ray.data.dataset?	15	1400	December 8, 2022
How to deal with labeled image datasets? Ray Data	11	664	May 31, 2023
Reading a list of images in a Worfklows Ray Workflows	1	26	December 13, 2024

Ray dataset with multiple images per batch

Related topics