Ray Data has a LLM module that enables efficient batch inference with large language models (LLMs) using Ray Data. It integrates with inference engines like vLLM and OpenAI-compatible APIs, allowing users to process LLM requests in parallel, optimize resource usage, and configure model parallelism for larger models.
More documentation here: