Ray Serve Replica taking a lot of memory before requests even come in

I just started my Ray Serve application and I noticed the following high memory usage before I sent in my first request.
I have two deployments. One is a healthcheck and the other is one which does the prediction. I’m curious why there is a 1G memory usage for my prediction replicas before I sent in my request. Here is my code

@serve.deployment(route_prefix="/predict", num_replicas=conf[constants.SERVER_COUNT])
class ImageModel:
    def __init__(self):
        self.logger = util.make_logger("ray")
        self.model_actor = ray.get_actor(MODEL_ACTOR)
        self.context = make_context()  # an object to store loggers and datetime functions
        self.logger.info("Ready to Serve")

The base replica without any imports or dependencies should take <100MiB (as shown in the second row). Do you have any model initialized inside the prediction replica?

Hi Simon,
only the above. Is that considered as model initialization?

I actually perform the model initialization in the same script in the main function.

def build_model_actor(model_name: str, conf: Dict):
    _options = dict(name=model_name, num_gpus=.., max_concurrency=..)
    print(f"Model Actor Options:{_options}")
    return ModelActor.options(**_options).remote(conf)

# app.py
from . import predictor_util
from my_classificatin.prediction import predictor

MODEL_ACTOR = "model_actor"
conf = {} # some values

class ImageModel:
      ... # same as above

if __name__ == "__main__":

    model_actor = predictor.build_model_actor(MODEL_ACTOR, conf)
    HealthCheck.deploy() # In the snapshot, this is the one taking around 87.1 MiB

    while True:

Hmm that shouldn’t be considered. ActorHandle returned by ray.get_actor are pretty lightweight. Is there any other variables that might use big chunk of memory in ImageModel? There might also be unintentional closure capture (i.e the image model class definition captured a big global variable)