Offline inference vLLM: map_batches vs build_llm_processor

Yes, if you want absolute control over batch composition and ordering, you can manually create a Ray Dataset for each batch and run the processor on it as in your pseudo-code. This approach ensures each batch is processed exactly as you define, but it bypasses Ray Data’s internal batching, pipelining, and parallelism features, and may be less efficient for large-scale workloads (Ray Data docs).

This method is valid for scenarios where strict batch boundaries or custom batch logic are required, but for most use cases, leveraging Ray Data’s built-in batching and parallelism is recommended for performance and scalability.

Sources:

Hint: Mention @RunLLM in the post for followups.