Hi Team,
I am looking for recommendations on restructuring current model serving platform.
- Depiction of present model serving
- Ray-transformed model serving
Now with Ray-transformed architecture, pain point of cpu and memory heavy model has been addressed by separate serving of individual components as depicted in image for Model 2.
But now I end up with
- Deploy all Models(including sub components) and FastAPI layer individually and in a particular order.
This is definitely not a way to go, I am sure there must be a better approach to orchestrate all these individual deployments or may be an another way to accomplish it.
I am open to ideas or if someone can point me to any docs available.
Thank you in advance!