Ray.data.read_csv Huge Dataset memory limitations

Manuelo96 · September 5, 2023, 12:23pm

I am using ray air TorchTrainer within my ray cluster. The dataset is created from a csv file and then multiple workers work with it, the issue is that when calling TorchTrainer it appears that the file is read all into memory by just one node. Is there anywhere to avoid loading the complete file into memory?

Topic		Replies	Views
How to convert Pytorch torch.utils.data.Dataset to ray.data.dataset?	15	1447	December 8, 2022
Shared dataset on a local desktop	1	302	March 7, 2023
[Data][ray2.2.0] Out of Memory when using ray.data.from_torch Ray Data	0	513	February 8, 2023
Slow Large-Scale Ingest w/Ray AIR (Ray Data + Ray Train)	20	1731	July 28, 2022
Ray Train RuntimeError: unable to write to file </torch_1602_2842463136> Ray Train	3	1142	January 7, 2022

Ray.data.read_csv Huge Dataset memory limitations

Related topics