Using Ray Plasma Effectively

SumanthDatta · April 9, 2021, 6:43pm

Hi ,

We are using RaySGD dataset for our training ,parallel iterator reads data from our persistent store.

During every training epoch , iterator is fetching data again from persistent store. Ray Plasma is filled up to some extent.Instead of reading from persistent store often, iterator has to read most of the data from a plasma store. How to effectively achieve that.

sangcho · April 9, 2021, 7:06pm

cc @rliaw Can you answer his question?

rliaw · April 9, 2021, 7:12pm

Maybe you could try using the Dataset API? The MLDataset API cc @Kai_Huang could also be useful here.

https://docs.ray.io/en/master/raysgd/raysgd_dataset.html

SumanthDatta · April 10, 2021, 2:09am

@rliaw we are already using raysgd dataset API.

Kai_Huang · April 12, 2021, 1:52am

Thanks @rliaw I’ll look at it

Topic		Replies	Views
Reading Data in parallel from file and pushing to the plasma object store Ray Core	4	866	April 1, 2021
Improve and verify the performance of code on Ray Ray Core	0	277	March 3, 2021
Ray Data and Train connection options Ray Libraries (Data, Train, Tune, Serve)	0	41	April 12, 2024
Plasma usage across Nodes Ray Serve	2	622	March 8, 2022
Converting torch.utils.data.IterableDataset to Ray's Dataset Ray Data	4	600	April 13, 2022

Using Ray Plasma Effectively

Related Topics