Currentlly I am using FastAPI to serve models.
The architecture is pretty simple.
When the FastAPI app starts up, it creates model objects by looking for all models marked as active in a config file.
Each model object is basically a class containing methods for loading the model and performing inference.
The model object is stored in a map using a string (a unique model identifier).
During inference, the model id is passed to lookup the model object from the map and the infer method corresponding to the model is called.
Now in order to scale this, we want to convert this application to leverage ray/ ray serve.
What would be the best way to do this?
Do we convert all our model classes to serve.deployments?
We also have certain use-cases where they are two individual sklearn models being run sequentially inside a single model object. We want to convert this into a parallel workflow using ray dags.
Kindly comment on the best way to perform this migration