serveConfig with import path

1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

  • Ray version: latest 2.43
  • Python version: 3.11
  • OS: Ubuntu
  • Cloud/Infrastructure: On prem with AMD GPU’s
  • Other libs/tools (if relevant):

3. What happened vs. what you expected:

  • Expected: Copied vllm-serve.py script to /root folder in the image (also tried with mounting voulme where script included as configmap) and when I start rayservice, it is saying vllm serve module not found and failing.

Below is my rayservice manifest file:

apiVersion: ray.io/v1
kind: RayService
metadata:
name: llama-3-2-3b
spec:
serveConfigV2: |
applications:
- name: fm-llama-3-2-3b
route_prefix: /
import_path: vllm-serve:model
deployments:
- name: VLLMDeployment
#num_replicas: 16
ray_actor_options:
num_cpus: 2
#num_gpus: 1
# NOTE: num_gpus is set automatically based on TENSOR_PARALLELISM
runtime_env:
env_vars:
BACKEND: “ray”
MODEL_ID: “meta-llama/Llama-3.1-70B-Instruct”
TENSOR_PARALLELISM: “2”
PIPELINE_PARALLELISM: “2”
rayClusterConfig:
headGroupSpec:
rayStartParams:
dashboard-host: ‘0.0.0.0’
num-gpus: “0”
template:
spec:
containers:
- name: ray-head
image: cloud/ce:raycluster-rocm
resources:
limits:
cpu: “8”
memory: “20Gi”
requests:
cpu: “8”
memory: “20Gi”
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
- containerPort: 8888
name: grpc
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-secret
key: hf_api_token
envFrom:
- configMapRef:
name: server-config

workerGroupSpecs:
- replicas: 1
  minReplicas: 1
  maxReplicas: 2
  numOfHosts: 2
  groupName: gpu-group
  rayStartParams:
    num-gpus: "2"
  template:
    spec:
      containers:
      - name: fm-llama-3-2-3b
        image: cloud/ce:raycluster-rocm
        volumeMounts:
        - mountPath: /dev/shm
          name: dshm
        - mountPath: /root/.cache/huggingface/hub
          name: ray-pvc-source
          subPath: .cache/huggingface/hub
        env:
        - name: HUGGING_FACE_HUB_TOKEN
          valueFrom:
            secretKeyRef:
              name: hf-secret
              key: hf_api_token
        - name: NODE_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        resources:
          limits:
            cpu: "50"
            memory: "200Gi"
            amd.com/gpu: "2"
          requests:
            cpu: "50"
            memory: "200Gi"
            amd.com/gpu: "2"
      volumes:
      - name: dshm
        emptyDir:
          medium: Memory
          sizeLimit: 100Gi
      - name: ray-pvc-source
        persistentVolumeClaim:
          claimName: ray-pv-claim
      # Please add the following taints to the GPU node.
      tolerations:
        - key: "amd.com/gpu"
          operator: "Exists"
          effect: "NoSchedule"

~

  • Actual: