Ray Serve multi application fails importing module

Hi team,
I’m trying to use Ray serve multi-application configs to deploy an application to my kubernetes cluster with a presigned aws url for my working_dir. It seems to work locally, but when I try to deploy to kubernetes, I get ModuleNotFoundError: No module named 'deployments'

Here is my setup
Directory:

application_root_dir
      - deployments.py (contains all the ray deployments)
      - config.yaml (ray deployment config)
      ...

Config.yaml file:

# This file was generated using the `serve build` command on Ray v2.4.0.

proxy_location: EveryNode

http_options:

  host: 0.0.0.0

  port: 8000

applications:

- name: app1

  route_prefix: /path

  import_path: deployments:model_name

  runtime_env: {}

  deployments:

  - name: model_name
    autoscaling_config:
      min_replicas: 1
      initial_replicas: 2
      max_replicas: 5
      target_num_ongoing_requests_per_replica: 10.0
      metrics_interval_s: 10.0
      look_back_period_s: 30.0
      smoothing_factor: 1.0
      downscale_delay_s: 600.0
      upscale_delay_s: 30.0
    ray_actor_options:
      runtime_env:
        pip:
          ...
        working_dir: {aws s3 presigned url} 
      num_cpus: 1.0

Kubernetes config:

# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
  name: raycluster-sample
spec:
  rayVersion: '2.4.0' # should match the Ray version in the image of the containers
  ######################headGroupSpecs#################################
  # head group template and specs, (perhaps 'group' is not needed in the name)
  enableInTreeAutoscaling: true
  autoscalerOptions:
    # upscalingMode is "Default" or "Aggressive."
    # Conservative: Upscaling is rate-limited; the number of pending worker pods is at most the size of the Ray cluster.
    # Default: Upscaling is not rate-limited.
    # Aggressive: An alias for Default; upscaling is not rate-limited.
    upscalingMode: Default
    # idleTimeoutSeconds is the number of seconds to wait before scaling down a worker pod which is not using Ray resources.
    idleTimeoutSeconds: 60
    # image optionally overrides the autoscaler's container image.
    # If instance.spec.rayVersion is at least "2.0.0", the autoscaler will default to the same image as
    # the ray container. For older Ray versions, the autoscaler will default to using the Ray 2.0.0 image.
    ## image: "my-repo/my-custom-autoscaler-image:tag"
    # imagePullPolicy optionally overrides the autoscaler container's default image pull policy (IfNotPresent).
    imagePullPolicy: IfNotPresent
    # Optionally specify the autoscaler container's securityContext.
    securityContext: {}
    env: []
    envFrom: []
    # resources specifies optional resource request and limit overrides for the autoscaler container.
    # The default autoscaler resource limits and requests should be sufficient for production use-cases.
    # However, for large Ray clusters, we recommend monitoring container resource usage to determine if overriding the defaults is required.
    resources:
      limits:
        cpu: "500m"
        memory: "512Mi"
      requests:
        cpu: "500m"
        memory: "512Mi"
  headGroupSpec:
    # Kubernetes Service Type, valid values are 'ClusterIP', 'NodePort' and 'LoadBalancer'
    serviceType: ClusterIP
    # the pod replicas in this group typed head (assuming there could be more than 1 in the future)
    replicas: 1
    # logical group name, for this called head-group, also can be functional
    # pod type head or worker
    # rayNodeType: head # Not needed since it is under the headgroup
    # the following params are used to complete the ray start: ray start --head --block --redis-port=6379 ...
    rayStartParams:
      port: '6379'
      #include_webui: 'true'
      object-store-memory: '100000000'
      # webui_host: "10.1.2.60"
      dashboard-host: '0.0.0.0'
      memory: '2147483648'
      node-ip-address: $MY_POD_IP # auto-completed as the head pod IP
      block: 'true'
    #pod template
    template:
      metadata:
        labels:
          # custom labels. NOTE: do not define custom labels start with `raycluster.`, they may be used in controller.
          # Refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
          rayCluster: raycluster-sample # will be injected if missing
          rayNodeType: head # will be injected if missing, must be head or wroker
          groupName: headgroup # will be injected if missing
        # annotations for pod
        annotations:
          key: value
      spec:
        containers:
          - name: ray-head
            image: rayproject/ray-ml:2.4.0-gpu
            imagePullPolicy: Always
            #image: bonsaidev.azurecr.io/bonsai/lazer-0-9-0-cpu:dev
            env:
              - name: MY_POD_IP
                valueFrom:
                  fieldRef:
                    fieldPath: status.podIP
            resources:
              limits:
                cpu: 1
                memory: 10Gi
              requests:
                cpu: 1
                memory: 10Gi
            ports:
              - containerPort: 6379
                name: gcs
              - containerPort: 8265 # Ray dashboard
                name: dashboard
              - containerPort: 10001
                name: client
              - containerPort: 8000
                name: serve
              - containerPort: 52365
                name: dashboard-agent
  workerGroupSpecs:
    # the pod replicas in this group typed worker
    - replicas: 1
      minReplicas: 1
      maxReplicas: 5
      # logical group name, for this called small-group, also can be functional
      groupName: small-group
      # if worker pods need to be added, we can simply increment the replicas
      # if worker pods need to be removed, we decrement the replicas, and populate the podsToDelete list
      # the operator will remove pods from the list until the number of replicas is satisfied
      # when a pod is confirmed to be deleted, its name will be removed from the list below
      #scaleStrategy:
      #  workersToDelete:
      #  - raycluster-complete-worker-small-group-bdtwh
      #  - raycluster-complete-worker-small-group-hv457
      #  - raycluster-complete-worker-small-group-k8tj7
      # the following params are used to complete the ray start: ray start --block --node-ip-address= ...
      rayStartParams:
        block: 'true'
        node-ip-address: $MY_POD_IP
      #pod template
      template:
        metadata:
          labels:
            key: value
          # annotations for pod
          annotations:
            key: value
        spec:
          initContainers:
            # the env var $RAY_IP is set by the operator if missing, with the value of the head service name
            - name: init-myservice
              image: busybox:1.28
              command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
          containers:
            - name: machine-learning # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
              image: rayproject/ray-ml:2.4.0-gpu
              imagePullPolicy: Always
              # environment variables to set in the container.Optional.
              # Refer to https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
              env:
                - name:  RAY_DISABLE_DOCKER_CPU_WARNING
                  value: "1"
                - name: TYPE
                  value: "worker"
                - name: CPU_REQUEST
                  valueFrom:
                    resourceFieldRef:
                      containerName: machine-learning
                      resource: requests.cpu
                - name: CPU_LIMITS
                  valueFrom:
                    resourceFieldRef:
                      containerName: machine-learning
                      resource: limits.cpu
                - name: MEMORY_LIMITS
                  valueFrom:
                    resourceFieldRef:
                      containerName: machine-learning
                      resource: limits.memory
                - name: MEMORY_REQUESTS
                  valueFrom:
                    resourceFieldRef:
                      containerName: machine-learning
                      resource: requests.memory
                - name: MY_POD_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
                - name: MY_POD_IP
                  valueFrom:
                    fieldRef:
                      fieldPath: status.podIP
              ports:
                - containerPort: 80
                  name: client
              lifecycle:
                preStop:
                  exec:
                    command: ["/bin/sh","-c","ray stop"]
              resources:
                limits:
                  cpu: 1
                  memory: 5Gi
                requests:
                  cpu: 1
                  memory: 5Gi

I have tested it locally and it seems to work
Here are my testing steps from inside application_root_dir

  • serve build --multi-app deployments:model_name -o config.yaml
  • ray start --head
  • serve deploy config.yaml
  • serve status:
name: app1
app_status:
  status: RUNNING
  message: ''
  deployment_timestamp: 1683592551.1961145
deployment_statuses:
- name: app1
  status: HEALTHY
  message: ''

However, when I try to deploy to my remote kubernetes ray cluster, I run
serve deploy config.yaml --address {remote cluster dashboard agent address}

And then when I check the serve status, I see this error:

Deploying app 'app1' failed:
e[36mray::deploy_serve_application()e[39m (pid=5108, ip=172.31.107.146)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/serve/controller.py", line 938, in deploy_serve_application
    app = build(import_attr(import_path), name)
  File "/home/ray/anaconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'deployments'

Any thoughts on why this could be/any workarounds? Would really appreciate your help

Anyone have any thoughts on this? Right now we are having to submit ray jobs to run serve applications instead of using the ray serve CR, which creates issues when we are doing things like refreshing the node group or upgrading the ray serve application.
cc @architkulkarni @eoakes sorry to ping, just saw that you folks had recently commented on the ray serve forum - would really appreciate your opinion on this.

How is the application_root_dir file getting onto the cluster? My guess is you need to specify the working_dir in the runtime_env field of config.yaml, rather than in the ray_actor_options. This will allow the serve controller to see the deployments.py file, not just the Serve replicas.

Thanks so much, that seemed to work! Also one more question - what’s the recommended way to handle node refreshing with this setup? I think when the multi-application config becomes compatible with the k8s serveConfig, it would probably automatically try to bring things back up? But right now, if i try to refresh the nodes for a k8s version update (for example), it’ll wipe the serve application and it won’t try to bring it back up afterwards

Will RayService - KubeRay Docs work for that use case?

When I run serve build --multi-app {apps} -k -o config.yaml, it says that multi-app is not compatible with the kubernetes config yet, is there another way to add it?

Not sure about this one – maybe @eoakes has more information about a workaround for multi-app.