Better way to restructure current model serving on Kubernetes

Hi Team,

I am looking for recommendations on restructuring current model serving platform.

  1. Depiction of present model serving

  1. Ray-transformed model serving



Now with Ray-transformed architecture, pain point of cpu and memory heavy model has been addressed by separate serving of individual components as depicted in image for Model 2.

But now I end up with

  • Deploy all Models(including sub components) and FastAPI layer individually and in a particular order.


This is definitely not a way to go, I am sure there must be a better approach to orchestrate all these individual deployments or may be an another way to accomplish it.

I am open to ideas or if someone can point me to any docs available.

Thank you in advance!