Does `ray.data.Dataset.iter_batches` guarantee order of the original file?

fengsp · May 17, 2023, 3:41am

For example, we read a csv or parquet file into a ray dataset, would the following code get the same result?

# ray dataset
for batch in ray_dataset.iter_batches():
    for row in batch:
        print(row)
# raw file
for row in pd.read_parquet('example.parquet').iterrows():
    print(row)

What about a shard of the dataset, would we get the same order as the original file?

shard1, _, _ = ds.split_at_indices([2000, 5000])
# same order as the original_file[:2000] ?
for row in shard1.iter_batches():
    print(row)

fengsp · May 25, 2023, 2:50am

Can I get any help here? No explicit docs could be found for this.

gjoliver · May 26, 2023, 4:40pm

can you try setting ray.data.context.DatasetContext.get_current().execution_options.preserve_order = True like this example here?
https://docs.ray.io/en/latest/data/glossary.html#term-Batch-format

fengsp · May 28, 2023, 10:41am

It seems this options is not supported in ray==2.2.0, How to get this option for ray 2.2.0? Order is not preserved for all versions of ray?

gjoliver · May 28, 2023, 5:00pm

ray Dataset is undergoing really active development.
I’d actually recommend you always use the latest version for performance and functionality improvements.

fengsp · May 29, 2023, 3:49am

I am using a library that requires ray==2.2.0, anyway, I will implement it myself without using ray dataset api since ray dataset does not work.

Topic		Replies	Views
Process/Materialize Data In Input Order Ray Data	1	202	March 29, 2024
Ray dataset from IterableDataset. No lazy implementation? Ray Libraries (Data, Train, Tune, Serve)	0	27	November 15, 2024
Interleaving file reads with custom datasource Ray Libraries (Data, Train, Tune, Serve)	0	227	January 23, 2024
Aync & Wait/Get for Datasets Ray Data	1	823	December 7, 2021
Ray Data: How to yield entire groups from a batch? Ray Libraries (Data, Train, Tune, Serve)	5	251	January 27, 2024

Does `ray.data.Dataset.iter_batches` guarantee order of the original file?

Related topics