Ray Serve - Observing high latencies when using custom docker image

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task as this needs to go in production soon.

Hello,

Initially I was using the image - rayproject/ray:2.39.0 and runtime_envs for deployment.

Now, I am bundling my ml model using mlflow pyfunc. Hence I needed to create a custom dockerfile to load my model. My dockerfile looks like this -

FROM rayproject/ray:2.39.0

RUN pip install mlflow==2.15.1

COPY utils /utils
COPY ray-serve-app /serve-app

RUN sudo mkdir -p /tmp/model && sudo chmod -R 777 /tmp/model && python /utils/download_model_artifacts.py ${MODEL_NAME} ${MODEL_VERSION}

RUN sed -i.bak -E "s|(https://.*:).*(@.*)|\1${AWS_TOKEN}\2|" "/tmp/model/conda.yaml"

RUN conda env update --name base -f /tmp/model/conda.yaml

WORKDIR /serve-app

RUN pip install ray[serve]==2.39.0

My deployment latencies per request have jumped from 2 ms to almost 15 ms when I use this image for my ray serve deployment. Can you tell what I’m doing wrong or where I need to be looking for the root cause?