Ray Serve


Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.
Topic Replies Views Activity
0 791 November 17, 2020
2 26 April 26, 2025
4 162 April 24, 2025
0 5 April 24, 2025
0 6 April 23, 2025
5 39 April 21, 2025
5 48 April 17, 2025
2 607 April 17, 2025
3 38 April 16, 2025
9 84 April 11, 2025
2 39 April 7, 2025
2 40 April 5, 2025
2 454 April 3, 2025
6 94 April 3, 2025
2 91 March 27, 2025
1 22 March 27, 2025
2 21 March 25, 2025
1 18 March 24, 2025
1 38 March 20, 2025
7 412 March 17, 2025
2 25 March 13, 2025
2 13 March 12, 2025
0 38 March 6, 2025
3 77 March 4, 2025
5 2183 February 24, 2025
7 827 February 19, 2025
3 50 February 14, 2025
3 59 January 30, 2025
2 29 January 29, 2025
1 30 January 27, 2025