Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 808 November 17, 2020
0 18 October 7, 2025
1 35 October 6, 2025
4 94 September 19, 2025
3 69 September 19, 2025
8 1146 September 8, 2025
1 53 August 18, 2025
1 22 August 18, 2025
1 36 August 17, 2025
1 127 August 14, 2025
2 468 August 13, 2025
3 514 August 9, 2025
0 23 August 6, 2025
1 9 August 4, 2025
1 26 August 1, 2025
0 25 July 29, 2025
1 35 July 28, 2025
4 254 July 20, 2025
1 32 July 8, 2025
1 31 July 7, 2025
1 24 July 3, 2025
0 31 July 1, 2025
1 31 June 25, 2025
1 39 June 23, 2025
14 1706 June 19, 2025
2 37 June 16, 2025
3 38 June 11, 2025
5 107 June 11, 2025
1 34 June 10, 2025
3 325 June 3, 2025