Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 802 November 17, 2020
4 34 July 20, 2025
1 22 July 8, 2025
1 21 July 7, 2025
1 13 July 3, 2025
0 15 July 1, 2025
1 8 June 25, 2025
1 17 June 23, 2025
3 52 June 20, 2025
14 1643 June 19, 2025
2 31 June 16, 2025
3 29 June 11, 2025
5 81 June 11, 2025
1 24 June 10, 2025
3 128 June 3, 2025
2 34 March 25, 2025
1 31 May 20, 2025
7 89 May 20, 2025
0 21 May 20, 2025
7 176 May 19, 2025
1 13 May 19, 2025
2 643 May 16, 2025
2 36 May 9, 2025
0 22 May 8, 2025
2 39 May 6, 2025
1 30 May 6, 2025
4 247 April 24, 2025
0 19 April 24, 2025
5 77 April 21, 2025
5 64 April 17, 2025