Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 793 November 17, 2020
2 23 May 9, 2025
0 12 May 8, 2025
5 79 May 7, 2025
2 20 May 6, 2025
1 20 May 6, 2025
4 175 April 24, 2025
0 13 April 24, 2025
5 44 April 21, 2025
5 52 April 17, 2025
2 721 April 17, 2025
3 40 April 16, 2025
9 88 April 11, 2025
2 51 April 7, 2025
2 42 April 5, 2025
2 456 April 3, 2025
6 120 April 3, 2025
2 102 March 27, 2025
1 22 March 27, 2025
2 21 March 25, 2025
1 18 March 24, 2025
1 47 March 20, 2025
7 488 March 17, 2025
2 29 March 13, 2025
2 13 March 12, 2025
0 42 March 6, 2025
3 90 March 4, 2025
5 2197 February 24, 2025
7 832 February 19, 2025
3 51 February 14, 2025