Runtime Environment Caching with Ray Serve and Persistent Volumes

Mayank_Garg · March 25, 2025, 9:03pm

Hello Ray Team,

I’m working with Ray Serve on Kubernetes and using serveConfigV2 to deploy two applications, image_classifier and text_generator. Here’s a snippet of my serveConfigV2:

serveConfigV2: |
    applications:
      - name: image_classifier
        import_path: image_classifier.image_classifier:serve_app
        route_prefix: /image_classifier
        runtime_env:
          env_vars:
            PYTHONPATH: "/mnt/gcs"
          pip: ["pillow", "opencv-python-headless"]

      - name: text_generator
        import_path: text_generator.text_generator:serve_app
        route_prefix: /text_generator
        runtime_env:
          env_vars:
            PYTHONPATH: "/mnt/gcs"
          pip: ["spacy"]

Questions:

1.) Default Caching Behavior: In this configuration, how does Ray handle the runtime environment caching for the pip dependencies (e.g., pillow, opencv-python-headless, spacy)? Is the cache shared across worker nodes, or is it local to each node?

2.) Shared Cache Configuration: If the cache is local by default, how can I configure Ray to use the mounted PVC (/mnt/gcs) as a shared cache for these pip dependencies? What are the necessary configuration changes to ensure that worker nodes reuse the cached packages, especially during scaling events?

christina · March 27, 2025, 9:00pm

Hi there Mayank! Welcome to the Ray community!
Regarding your questions for pip dependencies, I found this in our documentation: Environment Dependencies — Ray 2.44.1

Runtime environment resources on each node (such as conda environments, pip packages, or downloaded working_dir or py_modules directories) will be cached on the cluster to enable quick reuse across different runtime environments within a job. Each field (working_dir, py_modules, etc.) has its own cache whose size defaults to 10 GB. To change this default, you may set the environment variable RAY_RUNTIME_ENV_<field>_CACHE_SIZE_GB on each node in your cluster before starting Ray e.g. export RAY_RUNTIME_ENV_WORKING_DIR_CACHE_SIZE_GB=1.5.

When the cache size limit is exceeded, resources not currently used by any Actor, Task or Job are deleted.

Let me know if that answers your question.

For the second question, to configure Ray to use a shared cache for pip dependencies across worker nodes, ~ I think ~ you would need to set up a shared file system that all nodes can access. In your case, you mentioned a mounted PVC (/mnt/gcs ) and your PYTHONPATH is pointing to it which seems correct.

Have you tried running your serveConfigV2, is it working as expected with PYTHONPATH: "/mnt/gcs"? Is it cacheing as expected? Or are you running into errors.

PS. We recently released some cool Python package management system with uv, that might be of interest to you too! uv + Ray: Pain-Free Python Dependencies in Clusters | Anyscale

Topic		Replies	Views
How to run a RayService with container Runtime Environment on RayCluster Ray Clusters	1	753	June 8, 2023
[serve] Is it possible to run backends with different python versions? Ray Serve	5	640	August 18, 2021
Problems with file dependencies in serve Ray Serve	2	484	December 22, 2021
Documentation around runtime enivronments Ray Core	3	391	July 6, 2022
Run each actor in its own virutal environment Ray Core	2	503	December 2, 2021

Runtime Environment Caching with Ray Serve and Persistent Volumes

Related topics