I have an existing pipeline written in pytorch lightning for training a tabular deep learning model. I am able to scale it to handle ~50Mn samples. However I want to scale it to 500Mn samples, which is where ray datasets comes in.
I want to be able to use ray datasets for loading my data (straight from the cloud) instead of the native pytorch dataloaders while using the same pytorch lightning code for training the model.
I was not able to find any documentation on the same. Any help regarding would be really appreciated.