I’m loading a large language model in the init function of my Ray serve deployment. The model takes some time to load and Ray constantly tears down and recreates the deployment with messages such as these:
Recovering target state for deployment Generate from checkpoint…
(ServeController pid=2866163) INFO 2023-04-08 11:17:07,369 controller 2866163 deployment_state.py:1310 - Adding 1 replica to deployment ‘Generate’.
How do I increase the time allowed to load a model in the init function of a deployment?
Hi @ankur_ankur! Glad to hear you that you are exploring ray serve!
From the log, it looks like the serve controller is dead (and restarted), can you double check the node resources when you load the model? If you check /tmp/ray/session_latest/logs/serve/ you should see multiple controller log, can you check all them if there are any failures?