Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 819 November 17, 2020
4 26 December 9, 2025
3 21 December 1, 2025
46 183 November 14, 2025
2 20 November 12, 2025
2 22 October 30, 2025
1 20 October 30, 2025
8 890 October 25, 2025
2 62 October 24, 2025
10 1278 October 22, 2025
0 31 October 7, 2025
4 116 September 19, 2025
3 88 September 19, 2025
1 80 August 18, 2025
1 29 August 18, 2025
1 59 August 17, 2025
1 239 August 14, 2025
2 474 August 13, 2025
3 532 August 9, 2025
0 27 August 6, 2025
1 22 August 4, 2025
1 35 August 1, 2025
0 30 July 29, 2025
1 45 July 28, 2025
4 432 July 20, 2025
1 42 July 8, 2025
1 44 July 7, 2025
1 34 July 3, 2025
0 42 July 1, 2025
1 43 June 25, 2025