Streaming write using ray's write_parquet for llm inference

0

I need to do inference using vllm for large dataset, code structure as below:

ds = ray.data.read_parquet(my_input_path)
ds = input_data.map_batches(
    LLMPredictor,
    concurrency=ray_concurrency,
    ...
    **resources_kwarg
)
ds.write_parquet(my_output_path)

My input data is a S3 path contains lots of parquet data, each file ~10MB

What I observed is for each node, the write process start only when all inference jobs finished. Is there a way to achieve streaming write? like every n batch we do a write.

The reason is

  1. When doing inference, only GPUs are working and CPUs are idle, don’t want to waste CPU resources at the moment
  2. If the dataset is large (~100GB), I don’t want to store the whole result in memory which may cause OOM, and I want to see inference result earlier, as long as inference result is generated

Does ray support it, how can I achieve it?

Thank you

Hi @cnmdestroyer , from my experience with ray data. I think ray data do support streaming write by default. How do you observe each node? Maybe you can check some details in ray dashboard, like below.