Streaming write using ray's write_parquet for llm inference

cnmdestroyer · March 12, 2025, 9:19pm

0

I need to do inference using vllm for large dataset, code structure as below:

ds = ray.data.read_parquet(my_input_path)
ds = input_data.map_batches(
    LLMPredictor,
    concurrency=ray_concurrency,
    ...
    **resources_kwarg
)
ds.write_parquet(my_output_path)

My input data is a S3 path contains lots of parquet data, each file ~10MB

What I observed is for each node, the write process start only when all inference jobs finished. Is there a way to achieve streaming write? like every n batch we do a write.

The reason is

When doing inference, only GPUs are working and CPUs are idle, don’t want to waste CPU resources at the moment
If the dataset is large (~100GB), I don’t want to store the whole result in memory which may cause OOM, and I want to see inference result earlier, as long as inference result is generated

Does ray support it, how can I achieve it?

Thank you

gexiaoxiao7 · March 17, 2025, 5:21am

Hi @cnmdestroyer , from my experience with ray data. I think ray data do support streaming write by default. How do you observe each node? Maybe you can check some details in ray dashboard, like below.

Topic		Replies	Views
Data loading of parquet files is very memory consuming Ray Data	2	1478	June 21, 2022
Ray Data streaming not streaming smoothly Ray Data	8	831	May 30, 2023
Read_sql with parallelism and write out as soon as a parallel task returns Ray Data	1	542	October 18, 2023
Ray data experience OOM issue during write_csv or write_parquet Ray Data	2	541	August 2, 2023
Benchmarks for Ray Data? Ray Data	13	1129	October 5, 2023

Streaming write using ray's write_parquet for llm inference

Related topics