Ray dataset with multiple images per batch


I’d like to convert a Pytorch dataset to a Ray dataset.
Every batch has three keys: id (int), image:ndarray, mask:ndarray
However, I can’t find an out-of-the-box solution from Ray data to read two images from disk.

Currently, I see 2 Options:

  1. Read a CSV with the ids and use the transform step to read the images
  2. Build a custom Datasource, that reads two images from disk

Am I missing something?


You can use the read_images API? Working with Images — Ray 2.6.1

Thx for the quick reply, but wouldn’t this just give me a dataset with either image or mask?

How is your data stored?


Got it. Then I would go with your first approach.