Ray Serve Replica taking a lot of memory before requests even come in

rabraham · September 18, 2021, 5:17pm

Hi,
I just started my Ray Serve application and I noticed the following high memory usage before I sent in my first request.
I have two deployments. One is a healthcheck and the other is one which does the prediction. I’m curious why there is a 1G memory usage for my prediction replicas before I sent in my request. Here is my code

@serve.deployment(route_prefix="/predict", num_replicas=conf[constants.SERVER_COUNT])
class ImageModel:
    def __init__(self):
        self.logger = util.make_logger("ray")
        self.model_actor = ray.get_actor(MODEL_ACTOR)
        self.context = make_context()  # an object to store loggers and datetime functions
        self.logger.info("Ready to Serve")
        pass

simon-mo · September 24, 2021, 6:08pm

The base replica without any imports or dependencies should take <100MiB (as shown in the second row). Do you have any model initialized inside the prediction replica?

rabraham · September 28, 2021, 3:28pm

Hi Simon,
only the above. Is that considered as model initialization?

I actually perform the model initialization in the same script in the main function.

#predictor.py
def build_model_actor(model_name: str, conf: Dict):
    _options = dict(name=model_name, num_gpus=.., max_concurrency=..)
    print(f"Model Actor Options:{_options}")
    return ModelActor.options(**_options).remote(conf)

# app.py
from . import predictor_util
from my_classificatin.prediction import predictor

MODEL_ACTOR = "model_actor"
conf = {} # some values

class ImageModel:
      ... # same as above

if __name__ == "__main__":
    ray.init() 

    model_actor = predictor.build_model_actor(MODEL_ACTOR, conf)
    serve.start()
    ImageModel.deploy()
    HealthCheck.deploy() # In the snapshot, this is the one taking around 87.1 MiB

    while True:
        time.sleep(600)

simon-mo · September 29, 2021, 4:33pm

Hmm that shouldn’t be considered. ActorHandle returned by ray.get_actor are pretty lightweight. Is there any other variables that might use big chunk of memory in ImageModel? There might also be unintentional closure capture (i.e the image model class definition captured a big global variable)

Topic		Replies	Views
Ray Serve - Setting num_replicas > 1 errors out and not using GPU Ray Serve	5	975	January 13, 2022
Memory issue when running ray.init() Ray Core	1	475	March 23, 2022
[Ray Serve] how to serve large models?	6	966	March 12, 2024
How to share memory between 2 replicas in Ray Serve Ray Serve	2	904	January 20, 2021
Resources allocation during serve deployment Ray Serve	5	666	December 3, 2022

Ray Serve Replica taking a lot of memory before requests even come in

Related topics