Deployment has taken more than 30s to initialize. This may be caused by a slow init or reconfigure method

nelsonrogers · September 12, 2024, 9:17am

How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.

I am trying to test serving an LLM on a local machine using FastAPI and it takes a while to load - the FastAPI __init__() method takes a while (more than 30 seconds) causing the deployment to go into a restart loop and I get this message in the logs:

“Deployment ‘my_deployment’ in application ‘my_app’ has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow init or reconfigure method.”

The message is correct, the init is slow, but there is nothing I can do about that at this stage. Is there a way to remove the init method’s timeout or to set it to a longer time?

Thanks

shrekris · September 12, 2024, 5:08pm

Hi @nelsonrogers, thanks for posting. That message is a warning, not an error. The deployment replica does not restart every 30s– it continues to initialize, and it emits that message every 30s to signal that the __init__ method is continuing but is slow. Does the replica eventually start?

nelsonrogers · September 16, 2024, 2:31pm

Ok, thanks for the reply.

Unfortunately, the replica does not start. It cuts out in the middle of loading the model (it loads 2 out of 4 shards of the model, based on the logs) and it suddenly terminates without giving any further error messages (or maybe it does be the replica dies before I can see it, at least). It just ends up doing this in a loop until I kill it manually.

This is my code if it helps:

from fastapi import FastAPI, HTTPException
from ray import serve
from pydantic import BaseModel
from transformers import pipeline
import torch
from huggingface_hub import login

app = FastAPI()

# Define a Pydantic model for the request body
class Message(BaseModel):
    role: str
    content: str

class Messages(BaseModel):
    messages: list[Message]

# Define the deployment
@serve.deployment(name="MyModel", num_replicas=1)
@serve.ingress(app)
class MyModel:
    def __init__(self):
        # Log in to the HuggingFace Hub
        login(token="my_token", add_to_git_credential=True)
        # Initialize the transformers pipeline
        model_id = "my_model_id"
        self.pipe = pipeline(
            "text-generation",
            model=model_id,
            model_kwargs={"torch_dtype": torch.bfloat16},
            device="cpu"
        )
    
    @app.post("/")
    async def generate(self, messages: Messages):
        try:
            # Prepare the messages for the model
            messages=[{"role": msg.role, "content": msg.content} for msg in messages.messages]
            
            # Generate the response using the pipeline
            outputs = self.pipe(
                messages,
                max_new_tokens=256,
                do_sample=False,
            )

            generated_text = outputs[0]["generated_text"][-1]["content"]
            
            # Return the generated response
            return {"generated_text": generated_text}
        except Exception as e:
            raise HTTPException(status_code=500, detail=str(e))

# The deployment is created here
deployment = MyModel.bind()

Topic		Replies	Views
Deployment's init function takes too long to load model Ray Serve	4	627	April 12, 2023
Replica initialize is too slow with load_model into replica init Ray Clusters	0	286	June 28, 2023
How to check the lengh of queue for each replica of deployment Ray Serve	7	850	February 19, 2025
Deployments and fastapi docs Ray Serve	4	804	February 14, 2022
Ray Serve Replica taking a lot of memory before requests even come in Ray Serve	3	503	September 29, 2021

Deployment has taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method

Related topics

Deployment has taken more than 30s to initialize. This may be caused by a slow init or reconfigure method