Running 10+ models on a ray cluster

delioda79 · February 18, 2022, 3:29pm

Hi,
We had an architecture in mind for one of our applications, and essentially we have quite a few models which have to run in an API, we could pack them each one in a different ray serve application, but we wanted to use a ray cluster and run an application with an endpoint per model. Each model requires some other preprocessing models, HuggingFace transformers and some gensim embedding. What we had done was to create a deployment per each model and a deployment that exposes an API via the integration with FAstAPI. Our problem, however, has been not only the amount of RAM and CPU…which OK is expected, and some autoscaling issues (again expected), but the big problem was that some of the actors and deployments were never started, even with the required amount of resources in the cluster. At some point the system would become unresponsive and would not schedule everything.
To understand better our idea, instead of deploying some services and scale each service via kubernetes, our idea was to scale through ray, so each model would be a dpeloyment instead of a specific separate service.
Do you guys think that this is something that should not be attempted? Is it something out of scope for ray serve and ray in general?

Thanks for your opinion on that

Ameer_Haj_Ali · February 27, 2022, 2:17pm

cc @Alex_Wu. Can you plz answer ^?

Topic		Replies	Views
Ray serve on Kubernetes Ray Serve	14	961	March 27, 2024
Automating the serving of many different models Ray Serve	8	1775	May 3, 2023
Production best practices for Ray Serve Ray Serve	6	1220	August 15, 2023
Better way to restructure current model serving on Kubernetes Ray Serve	0	559	January 14, 2022
Ray serve on K8s Ray Serve	1	631	April 5, 2021

Running 10+ models on a ray cluster

Related topics