Asynchronous dataset pipeline map

Seanny123 · April 23, 2022, 12:02am

While making this post, I figured out how I had to defined a DatasetPipeline:

import ray

def prepend_a(val):
    return f"a{val}"


def append_b(val):
    return f"{val}b"


def append_c(val):
    return f"{val}c"

data = ray.data.from_items([str(i) for i in range(10)])

pipe = data.window(blocks_per_window=2)
a_appended = pipe.map(prepend_a)
final = a_appended.map(lambda x: (append_b(x), append_c(x)))

for bb, cc in final.iter_rows():
    print(bb, cc)

Topic		Replies	Views
Pipeline DAG: join/aggregate independent steps Ray Data	3	731	January 25, 2023
Async and dataset transformation Ray Data	5	44	April 1, 2025
Just two stages present no matter how many stages defined for DatasetPipeline Ray Data	4	474	October 28, 2022
Prevent restart of actors in DatasetPipeline	0	209	July 24, 2023
Ray dataset pipeline scheduling missing opportunities	3	302	August 17, 2023

Asynchronous dataset pipeline map

Related topics