Using ray datasets with pytorch lightning

sahil711 · November 22, 2023, 3:24pm

I have an existing pipeline written in pytorch lightning for training a tabular deep learning model. I am able to scale it to handle ~50Mn samples. However I want to scale it to 500Mn samples, which is where ray datasets comes in.

I want to be able to use ray datasets for loading my data (straight from the cloud) instead of the native pytorch dataloaders while using the same pytorch lightning code for training the model.

I was not able to find any documentation on the same. Any help regarding would be really appreciated.

Thanks !!

Topic		Replies	Views
Ray Train with Ray datasets (includes images) too slow Ray Data	5	1251	February 14, 2023
Can Ray Dataset be used between S3 and PyTorch? Ray Data	4	1153	February 17, 2022
Keep PyTorch DataLoader when using Ray Data Ray Data	0	334	November 7, 2023
Compatibility of torch API's DataLoader and ray Dataset	1	619	March 24, 2021
Iter_torch_batch() return TypeError: can't convert np.ndarray of type numpy.object_	2	1908	April 10, 2023

Using ray datasets with pytorch lightning

Related topics