Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 811 November 17, 2020
1 13 November 1, 2025
2 10 October 30, 2025
1 11 October 30, 2025
8 820 October 25, 2025
2 54 October 24, 2025
10 1181 October 22, 2025
0 19 October 7, 2025
4 101 September 19, 2025
3 72 September 19, 2025
1 58 August 18, 2025
1 23 August 18, 2025
1 44 August 17, 2025
1 154 August 14, 2025
2 469 August 13, 2025
3 522 August 9, 2025
0 24 August 6, 2025
1 11 August 4, 2025
1 28 August 1, 2025
0 25 July 29, 2025
1 35 July 28, 2025
4 297 July 20, 2025
1 33 July 8, 2025
1 33 July 7, 2025
1 26 July 3, 2025
0 34 July 1, 2025
1 33 June 25, 2025
1 44 June 23, 2025
14 1723 June 19, 2025
2 39 June 16, 2025