I just started my Ray Serve application and I noticed the following high memory usage before I sent in my first request.
I have two deployments. One is a healthcheck and the other is one which does the prediction. I’m curious why there is a 1G memory usage for my prediction replicas before I sent in my request. Here is my code
@serve.deployment(route_prefix="/predict", num_replicas=conf[constants.SERVER_COUNT]) class ImageModel: def __init__(self): self.logger = util.make_logger("ray") self.model_actor = ray.get_actor(MODEL_ACTOR) self.context = make_context() # an object to store loggers and datetime functions self.logger.info("Ready to Serve") pass