1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.40
- Python version: 3.11
- OS: ubuntu 22.04
I am using vllm + volcano + ray to make a distributed inference service for LLM. There are two nodes in my k8s cluster (named nodeA and nodeB). When I start the service, nodeA runs normally, but nodeB reports the following error:
nvidia-container-cli: device error: GPU-xxxx0cde: unknown device: unknown
After checking, the gpu is the gpu number of nodeA. How can I solve this problem?
here is my yaml file:
apiVersion: ray.io/v1
kind: RayService
metadata:
name: qwen25-05b
namespace: deployment-system
spec:
serveConfigV2: |
applications:
- name: llm
route_prefix: /
import_path: kuberay.ray-operator.config.samples.vllm.serve:model
deployments:
- name: VLLMDeployment
num_replicas: 2
ray_actor_options:
num_cpus: 2
runtime_env:
env_vars:
MODEL_ID: "/model"
TENSOR_PARALLELISM: "1"
PIPELINE_PARALLELISM: "2"
rayClusterConfig:
headGroupSpec:
rayStartParams:
dashboard-host: '0.0.0.0'
disable-usage-stats: "true"
no-monitor: "true"
template:
metadata:
annotations:
scheduling.volcano.sh/queue-name: model-deploy
spec:
schedulerName: volcano
volumes:
- name: model
nfs:
server: nodeA
path: /nfs/public/model/
- name: kuberay
nfs:
server: nodeA
path: /nfs/kuberay/kuberay-master
containers:
- name: ray-head
image: docker.io/library/ray:2.40.0-torch2.5.1-cuda12.1-patch-fix
resources:
limits:
cpu: "2"
memory: "4Gi"
requests:
cpu: "2"
memory: "4Gi"
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
volumeMounts:
- name: model
mountPath: /model
- name: kuberay
mountPath: /kuberay
workerGroupSpecs:
- replicas: 2
minReplicas: 2
maxReplicas: 2
groupName: gpu-group
rayStartParams:
num-cpus: "2"
num-gpus: "1"
resources: '"{\"vgpu-memory\": 3000}"'
disable-usage-stats: "true"
no-monitor: "true"
template:
metadata:
annotations:
scheduling.volcano.sh/queue-name: model-deploy
spec:
schedulerName: volcano
volumes:
- name: model
nfs:
server: nodeA
path: /nfs/public/model/
- name: kuberay
nfs:
server: nodeA
path: /nfs/kuberay/kuberay-master
containers:
- name: llm
image: docker.io/library/ray:2.40.0-torch2.5.1-cuda12.1-patch-fix
resources:
limits:
cpu: 2
memory: "4Gi"
volcano.sh/vgpu-number: 1
volcano.sh/vgpu-memory: 3000
requests:
cpu: "2"
memory: "4Gi"
volcano.sh/vgpu-number: 1
volcano.sh/vgpu-memory: 3000
volumeMounts:
- name: model
mountPath: /model
- name: kuberay
mountPath: /kuberay
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: model-deploy
spec:
weight: 1
capability:
cpu: 6
memory: 12Gi
volcano.sh/vgpu-number: 2
volcano.sh/vgpu-number: 6000