Ray dataset with multiple images per batch

Hi,

I’d like to convert a Pytorch dataset to a Ray dataset.
Every batch has three keys: id (int), image:ndarray, mask:ndarray
However, I can’t find an out-of-the-box solution from Ray data to read two images from disk.

Currently, I see 2 Options:

  1. Read a CSV with the ids and use the transform step to read the images
  2. Build a custom Datasource, that reads two images from disk

Am I missing something?

Cheers,
Yannick

You can use the read_images API? Working with Images — Ray 2.6.1

Thx for the quick reply, but wouldn’t this just give me a dataset with either image or mask?

How is your data stored?

data
____images
_________1.tiff
_________2.tiff
____masks
_________1.tiff
_________2.tiff

Got it. Then I would go with your first approach.