Better way to restructure current model serving on Kubernetes

lihost · January 14, 2022, 12:12am

Hi Team,

I am looking for recommendations on restructuring current model serving platform.

Depiction of present model serving

Ray-transformed model serving

Now with Ray-transformed architecture, pain point of cpu and memory heavy model has been addressed by separate serving of individual components as depicted in image for Model 2.

But now I end up with

Deploy all Models(including sub components) and FastAPI layer individually and in a particular order.

This is definitely not a way to go, I am sure there must be a better approach to orchestrate all these individual deployments or may be an another way to accomplish it.

I am open to ideas or if someone can point me to any docs available.

Thank you in advance!

Topic		Replies	Views
Running 10+ models on a ray cluster Kubernetes	1	575	February 27, 2022
Automating the serving of many different models Ray Serve	8	1668	May 3, 2023
Dynamically serve new model via Ray Serve Ray Serve	5	71	June 11, 2025
Ray serve on Kubernetes Ray Serve	14	930	March 27, 2024
Production best practices for Ray Serve Ray Serve	6	1164	August 15, 2023

Better way to restructure current model serving on Kubernetes

Related topics