Directory structure dataset help

ya_boi · July 4, 2023, 1:42am

Hi everybody,

I have a challenge, I have grown accustomed to the Pytorch dataset framework where to create a dataset I create a custom Pytorch dataset object. For my dataset which is generated using Nvidia omniverse, I have a directory structure holding all my data, I will have a folder of rgb images and a folder of segmentation images and a folder of jsons ect. When I build the dataset the pytorch way I initialize by creating a table of paths so when getitem is called I will lookup the paths for that item and load each piece of data accordingly. But I was swayed when learning of the parallelism, sharding, and abstractions (local filesystem or s3 for example) of the ray datasets. However, I don’t see a clear way to transform my directory-structured dataset into a ray dataset. If anybody has any suggestions on how to go about doing this I would deeply appreciate it!

Topic		Replies	Views
Ray Dataset with Distributed PyTorch Ray Data	1	600	April 22, 2022
How to convert Pytorch torch.utils.data.Dataset to ray.data.dataset?	15	1396	December 8, 2022
Can Ray Dataset be used between S3 and PyTorch? Ray Data	4	1142	February 17, 2022
Using ray datasets with pytorch lightning	0	312	November 22, 2023
Ray dataset with multiple images per batch	5	217	September 1, 2023

Directory structure dataset help

Related topics