Ray Serve

Ray Serve LLM APIs Ray Serve has LLM APIs to provide an easy way to deploy and scale multiple LLM models with a unified API. It supports automatic scaling, multi-model deployment, OpenAI-compatible endpoints, and LoRA multiplexing. The engine-agnostic architecture works with frameworks like vLLM and SGLang, enabling efficient model serving across multiple nodes.

Topic	Replies	Views	Activity
About the Ray Serve category Ray Serve	0	803	November 17, 2020
Ray Serve vLLM multiple models per GPU in tensor parallelism Ray Serve LLM APIs	0	1	August 10, 2025
FastAPI backend + Ray Core vs Ray Serve Ray Serve	0	4	August 10, 2025
Integrating GradioIngress and non-gradio endpoints Ray Serve	3	498	August 9, 2025
Non-linear throughput when scaling Ray Serve replicas Ray Serve	2	22	August 8, 2025
Ray Serve kubernetes service also uses Head pod Ray Serve	0	12	August 6, 2025
How to download a model from an authenticated S3 storage? Ray Serve	1	8	August 4, 2025
How to Expose Ray Serve API with proxy_location="EveryNode" Outside the Cluster Ray Serve	1	10	August 1, 2025
Ray Replica take more time to healthy than EKS Pod Ray Serve	0	17	July 29, 2025
Does Ray Serve support PDB in EKS / Kubernetes Ray Serve	1	20	July 28, 2025
vLLM v1 engine initialization workaround with vllm installation at runtime Ray Serve LLM APIs	4	72	July 20, 2025
Dynamic request batching: partial response streaming Ray Serve	1	23	July 8, 2025
Send replica deployment logs to cloudwatch for eks pods Ray Serve	1	25	July 7, 2025
How to find no of requests/messages per replcia Ray Serve	1	14	July 3, 2025
Serving custom-built containers hanging on deployment Ray Serve	0	25	July 1, 2025
Does port 8000 run on head only or both workers and head Ray Serve	1	15	June 25, 2025
How to log to stdout from Ray Serve Ray Serve LLM APIs	1	24	June 23, 2025
Ray Serve not distributing load to all replicas equally Ray Serve	3	55	June 20, 2025
Ray Serve Sharing Objects with Deployment Ray Serve	14	1653	June 19, 2025
Losing Frames in the interaction of multiple @serve.deployment Ray Serve	2	32	June 16, 2025
Ray Serve replica level autoscaling not working with Kube deployment Ray Serve	3	29	June 11, 2025
Dynamically serve new model via Ray Serve Ray Serve	5	87	June 11, 2025
SocketIO support Ray Serve	1	27	June 10, 2025
torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to Ray Serve LLM APIs	3	175	June 3, 2025
How to keep frame and detected boundingboxes in order for object tracker Ray Serve	2	35	March 25, 2025
Query application status API triggers re-deployment? Ray Serve	1	31	May 20, 2025
How to route traffic to LiteLLM models using Serving LLMs Ray Serve LLM APIs	7	112	May 20, 2025
Conflict Between Orbax (nest_asyncio) and Ray Serve (uvloop) During Checkpointing – Option to Disable uvloop? Ray Serve	0	24	May 20, 2025
Ray Serve LLM APIs has 2~3x higher latency Ray Serve LLM APIs	7	204	May 19, 2025
Specifying resources using Ray Serve Ray Serve	1	21	May 19, 2025