Hi ray serve community,
In the last a few weeks, Ray Serve team has put up a public design doc for Serve Pipeline that aims to provide the best API to author an inference graph of models as a pipeline as well as artifacts & APIs for operationalizing serve deployments. It indents to cover multi-model inference as well as large model partitioning such as distributed inference.
It’s an evolution of our existing Alpha API where a lot of pieces will be changed.
We’re actively looking for community comments, feedbacks and collaboration on this effort, please don’t hesitate to let us know by commenting on the doc, on this thread, our slack channel or email us at serving@anyscale.com, thanks !
Best,
Ray Serve Team