Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 828 November 17, 2020
3 17 May 13, 2026
8 406 May 3, 2026
1 31 April 30, 2026
1 39 April 30, 2026
4 62 April 27, 2026
1 9 February 18, 2026
5 42 February 12, 2026
0 6 January 20, 2026
14 132 January 14, 2026
1 110 December 23, 2025
1 38 December 22, 2025
1 22 December 20, 2025
4 55 December 9, 2025
3 74 December 1, 2025
46 425 November 14, 2025
2 39 October 30, 2025
1 33 October 30, 2025
8 985 October 25, 2025
2 86 October 24, 2025
10 1451 October 22, 2025
0 76 October 7, 2025
4 146 September 19, 2025
3 124 September 19, 2025
1 109 August 18, 2025
1 38 August 18, 2025
1 107 August 17, 2025
1 519 August 14, 2025
2 480 August 13, 2025
3 542 August 9, 2025