Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 795 November 17, 2020
3 16 June 11, 2025
5 62 June 11, 2025
1 11 June 10, 2025
2 35 June 8, 2025
3 41 June 3, 2025
1 27 June 3, 2025
2 32 March 25, 2025
1 26 May 20, 2025
7 66 May 20, 2025
0 18 May 20, 2025
7 140 May 19, 2025
1 11 May 19, 2025
2 613 May 16, 2025
2 30 May 9, 2025
0 16 May 8, 2025
2 25 May 6, 2025
1 28 May 6, 2025
4 205 April 24, 2025
0 16 April 24, 2025
5 56 April 21, 2025
5 56 April 17, 2025
2 1010 April 17, 2025
3 45 April 16, 2025
9 92 April 11, 2025
2 61 April 7, 2025
2 46 April 5, 2025
2 458 April 3, 2025
6 175 April 3, 2025
2 122 March 27, 2025