1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.
2. Environment:
- Ray version: latest 2.43
- Python version: 3.11
- OS: Ubuntu
- Cloud/Infrastructure: On prem with AMD GPU’s
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected: Copied vllm-serve.py script to /root folder in the image (also tried with mounting voulme where script included as configmap) and when I start rayservice, it is saying vllm serve module not found and failing.
Below is my rayservice manifest file:
apiVersion: ray.io/v1
kind: RayService
metadata:
name: llama-3-2-3b
spec:
serveConfigV2: |
applications:
- name: fm-llama-3-2-3b
route_prefix: /
import_path: vllm-serve:model
deployments:
- name: VLLMDeployment
#num_replicas: 16
ray_actor_options:
num_cpus: 2
#num_gpus: 1
# NOTE: num_gpus is set automatically based on TENSOR_PARALLELISM
runtime_env:
env_vars:
BACKEND: “ray”
MODEL_ID: “meta-llama/Llama-3.1-70B-Instruct”
TENSOR_PARALLELISM: “2”
PIPELINE_PARALLELISM: “2”
rayClusterConfig:
headGroupSpec:
rayStartParams:
dashboard-host: ‘0.0.0.0’
num-gpus: “0”
template:
spec:
containers:
- name: ray-head
image: cloud/ce:raycluster-rocm
resources:
limits:
cpu: “8”
memory: “20Gi”
requests:
cpu: “8”
memory: “20Gi”
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
- containerPort: 8888
name: grpc
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-secret
key: hf_api_token
envFrom:
- configMapRef:
name: server-config
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 2
numOfHosts: 2
groupName: gpu-group
rayStartParams:
num-gpus: "2"
template:
spec:
containers:
- name: fm-llama-3-2-3b
image: cloud/ce:raycluster-rocm
volumeMounts:
- mountPath: /dev/shm
name: dshm
- mountPath: /root/.cache/huggingface/hub
name: ray-pvc-source
subPath: .cache/huggingface/hub
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-secret
key: hf_api_token
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
resources:
limits:
cpu: "50"
memory: "200Gi"
amd.com/gpu: "2"
requests:
cpu: "50"
memory: "200Gi"
amd.com/gpu: "2"
volumes:
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 100Gi
- name: ray-pvc-source
persistentVolumeClaim:
claimName: ray-pv-claim
# Please add the following taints to the GPU node.
tolerations:
- key: "amd.com/gpu"
operator: "Exists"
effect: "NoSchedule"
~
- Actual: