Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 819 November 17, 2020
1 4 December 22, 2025
1 3 December 20, 2025
4 31 December 9, 2025
3 27 December 1, 2025
46 196 November 14, 2025
2 20 November 12, 2025
2 22 October 30, 2025
1 20 October 30, 2025
8 898 October 25, 2025
2 65 October 24, 2025
10 1295 October 22, 2025
0 32 October 7, 2025
4 118 September 19, 2025
3 92 September 19, 2025
1 80 August 18, 2025
1 29 August 18, 2025
1 61 August 17, 2025
1 261 August 14, 2025
2 476 August 13, 2025
3 532 August 9, 2025
0 27 August 6, 2025
1 23 August 4, 2025
1 35 August 1, 2025
0 31 July 29, 2025
1 45 July 28, 2025
4 451 July 20, 2025
1 43 July 8, 2025
1 45 July 7, 2025
1 34 July 3, 2025