About the Ray Serve category

bill-anyscale · November 17, 2020, 12:02am

Topics include model serving and inference. Use Serve to deploy and scale machine learning models with built-in support for APIs, batching, and multi-GPU inference.

Ray Serve is a scalable model serving library for building online inference APIs. Serve is framework-agnostic, so you can use a single toolkit to serve everything from deep learning models built with frameworks like PyTorch, TensorFlow, and Keras, to Scikit-Learn models, to arbitrary Python business logic. It has several features and performance optimizations for serving Large Language Models such as response streaming, dynamic request batching, multi-node/multi-GPU serving, etc.

Ray Serve is particularly well suited for model composition and many model serving, enabling you to build a complex inference service consisting of multiple ML models and business logic all in Python code.

Documentation:

Topic		Replies	Views
Ray Serve: Ray Serve vs Regular Web server Performance? Ray Serve	2	1291	January 5, 2022
Keypoint streaming usecase Ray Serve	7	598	May 26, 2022
Ray Serve Blog Posts Ray Serve	0	578	November 12, 2020
Torch Ensemble serving Ray Serve	4	761	March 2, 2023
Serve Pipeline Design Doc -- Open for comments and collaboration Ray Serve	0	436	February 1, 2022

About the Ray Serve category

Related topics