Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 812 November 17, 2020
46 80 November 14, 2025
2 13 November 12, 2025
2 12 October 30, 2025
1 14 October 30, 2025
8 843 October 25, 2025
2 59 October 24, 2025
10 1214 October 22, 2025
0 22 October 7, 2025
4 103 September 19, 2025
3 73 September 19, 2025
1 62 August 18, 2025
1 24 August 18, 2025
1 46 August 17, 2025
1 166 August 14, 2025
2 470 August 13, 2025
3 525 August 9, 2025
0 24 August 6, 2025
1 13 August 4, 2025
1 29 August 1, 2025
0 26 July 29, 2025
1 38 July 28, 2025
4 311 July 20, 2025
1 33 July 8, 2025
1 35 July 7, 2025
1 27 July 3, 2025
0 35 July 1, 2025
1 37 June 25, 2025
1 49 June 23, 2025
14 1742 June 19, 2025