Best practice for loading deep learning models in production on Ray serve

Hello Team,

I am testing out different deep learning models in local machine using Docker. Currently all the models are stored on S3. And I have written a custom download util script which downloads the models before triggering the serve deployment.

The two questions which I have are:

  • Does ray provide some utility out of the box which can used to download model files from S3 for serving purposes

  • Will I be able to hot reload the serve deployment in case new models are present on s3 and needed to be downloaded?

Hi @AbhishekBose,

Great question! Serve doesn’t provide utility for downloading from S3. However, you can hot reload the model weights using a background polling thread. Would this satisfy your requirements?

@simon-mo Thanks for your response.
So I am guessing that I will have to write a util which will download the weights before the ray inference worker starts up.
The background thread which you mentioned. Does it have to be part of the application in which my serving route is written or another stand alone ray worker completely?

Either approaches work! You would recommend starting with the thread approach because it is simpler.

In my experience, in the old Serve API V1 I liked to deploy a little FastAPI “admin” deployment alongside the models so that I could programmatically manage those models. If you have a /reload endpoint, the job that saves the weights in S3 can just call it so that you do not have to do polling. If you have a use case where you have multiple model weights been landed on S3, your fastapi admin service can have an /add-model endpoint that can call a Ray Actor S3ServeDeployer.options(name=model-endpoint).deploy(S3_path) which can dynamically take whatever weight you have on S3 and deploy it on whatever new endpoint you want to create.

Of course this is all with the old API, I don’t think it’s as easy to achieve this with the new one.

1 Like