Process/Materialize Data In Input Order

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Firstly, greetings to everyone. This is my first newbie question. I am using Ray.Data to read a csv file, apply some map and filter functions, and materialize the results. Then i use iter_rows() to do post-processing. In this stage, i need to maintain the order of my input data. I see that Ray changes the order of data every time i re-run the code. Is there any way to indicate either to read_csv or materialize to maintain the input order. I am looking for any expert advice in this regard.

The work around i am thinking is to introduce an additional column to my input data (range of integers), and sort the dataset prior to materialize step. However, i assume that this is gonna to increase processing time as sorting is tedious.

Just wanted to update that while i got my workaround done, i also tried the following suggested in a previous post that i failed to notice before. The following worked and results got correct.

ray.data.context.DatasetContext.get_current().execution_options.preserve_order = True

1 Like