About the Ray Data LLM APIs category

Ray Data has a LLM module that enables efficient batch inference with large language models (LLMs) using Ray Data. It integrates with inference engines like vLLM and OpenAI-compatible APIs, allowing users to process LLM requests in parallel, optimize resource usage, and configure model parallelism for larger models.

More documentation here:

1 Like