Best practice for loading deep learning models in production on Ray serve

AbhishekBose · October 12, 2022, 5:46am

Hello Team,

I am testing out different deep learning models in local machine using Docker. Currently all the models are stored on S3. And I have written a custom download util script which downloads the models before triggering the serve deployment.

The two questions which I have are:

Does ray provide some utility out of the box which can used to download model files from S3 for serving purposes
Will I be able to hot reload the serve deployment in case new models are present on s3 and needed to be downloaded?

simon-mo · October 14, 2022, 5:07pm

Hi @AbhishekBose,

Great question! Serve doesn’t provide utility for downloading from S3. However, you can hot reload the model weights using a background polling thread. Would this satisfy your requirements?

AbhishekBose · October 14, 2022, 6:42pm

@simon-mo Thanks for your response.
So I am guessing that I will have to write a util which will download the weights before the ray inference worker starts up.
The background thread which you mentioned. Does it have to be part of the application in which my serving route is written or another stand alone ray worker completely?

simon-mo · October 14, 2022, 6:58pm

Either approaches work! You would recommend starting with the thread approach because it is simpler.

Andrea_Pisoni · October 27, 2022, 8:06am

In my experience, in the old Serve API V1 I liked to deploy a little FastAPI “admin” deployment alongside the models so that I could programmatically manage those models. If you have a /reload endpoint, the job that saves the weights in S3 can just call it so that you do not have to do polling. If you have a use case where you have multiple model weights been landed on S3, your fastapi admin service can have an /add-model endpoint that can call a Ray Actor S3ServeDeployer.options(name=model-endpoint).deploy(S3_path) which can dynamically take whatever weight you have on S3 and deploy it on whatever new endpoint you want to create.

Of course this is all with the old API, I don’t think it’s as easy to achieve this with the new one.

Topic		Replies	Views
Does Ray Serve support local model hot update/reload? Ray Serve	2	1179	July 5, 2022
Sharing big ML models using only Ray Core Ray Core	1	392	July 6, 2022
Ray Serve: custom resource optimization Ray Serve	3	471	January 26, 2023
Optimizing Real-Time ML Model Serving with Ray Serve on AWS GPU Cluster: Best Practices and Resource Allocation Strategies Ray Data	0	207	April 18, 2024
Dynamically serve new model via Ray Serve Ray Serve	5	66	June 11, 2025

Best practice for loading deep learning models in production on Ray serve

Related topics