How to run map_batches function in the same order as the blocks in the block_list

veryhannibal · April 12, 2023, 10:40am

Ok. Thank you very much for your help. So far I have found a solution that, although not optimal, can temporarily solve my problem. I split the original dataset into several smaller datasets, and each of them have less blocks, and then sequentially run these datasets in parallel by blocks.
If I have better solution, I will share it here.
Very happy to be here to discuss technology with you.

Topic		Replies	Views
Dataset support concurrency in one block when using map_batches	4	647	October 1, 2022
[Data] map_batches is not respecting concurrency from the beginning	1	157	December 6, 2024
[Datasets] Create custom dataset by grouping/merging existing blocks Ray Data	9	1264	November 30, 2022
Passing two datasets to `map_batches`	0	175	April 26, 2024
[Data] how to schedule UDF by grouped key's order in map_groups api	1	10	November 8, 2024

How to run map_batches function in the same order as the blocks in the block_list

Related topics