VLLM will report gpu missing on the hosting node in Ray

pcpLiu · January 31, 2025, 3:53pm

Context

We are using Ray Serve to deploy a vLLM App. It was working well til recently we upgrade version of vLLM to 0.7.0 and adapted to the API change.

We have a instance with six 4090 GPUs. We deployed a Ray cluster on it with one head node and 5 worker nodes. All are docker containers. Each container is attached to a gpu.

Issue

The core issue is that whenever the vLLM app tries to load the model from the disk, it fails to find GPU to the container where the APP is hosted.

All containers have exactly the same ENV
We can use vllm cli to run the model directly without any issue.
We tried to re-deploy many times. Each time the APP could be hosted on an arbitrary node. Whenever a node becomes the hosting node, it will throw error of that it does not have GPU. But in the cases when that node is not the hosted node, it can work well and load model weights smoothly.
We are now trying to downgrade the vllm version but want to get an idea if this is a bug or our usage issue. Thanks!

Log

 ValueError: Current node has no GPU available. current_node_resource={'node:172.17.0.6_group_0_5ea4bf00a38e2ed9e9af4d4e2c3d2c000000': 0.001, 'accelerator_type:G': 1.0, 'node:172.17.0.6_group_5ea4bf00a38e2ed9e9af4d4e2c3d2c000000': 0.001, 'CPU': 63.0, 'memory': 10593529856.0, 'object_store_memory': 4540084224.0, 'node:172.17.0.6': 0.999, 'bundle_group_0_5ea4bf00a38e2ed9e9af4d4e2c3d2c000000': 999.999, 'bundle_group_5ea4bf00a38e2ed9e9af4d4e2c3d2c000000': 999.999}. vLLM engine cannot start without GPU. Make sure you have at least 1 GPU available in a node current_node_id='6f6229bb91687736efbb6174c5885ad0ad7f5aa6fb53ad20afaee93a' current_ip='172.17.0.6'.

Full error log: gist:0aea4772b3273a2e9a6427c77eb25354 · GitHub

Reproduce

Python package version

vllm==0.7.0
ray==2.41.0
ray[serve]==2.41.0

gist.github.com

https://gist.github.com/pcpLiu/65fc7fd487c2afd91ed34b27d8eb5b0a

app.py

import logging
from argparse import Namespace
from typing import AsyncGenerator, List, Optional, TypedDict

from fastapi import FastAPI
from ray import serve
from starlette.requests import Request
from starlette.responses import JSONResponse, StreamingResponse
from vllm.config import ModelConfig
from vllm.engine.arg_utils import AsyncEngineArgs

This file has been truncated. show original

deployflow.py

async def deploy():
   ray.init(
        address=f"ray://{config.ray_cluster_head_host}:{config.ray_cluster_head_port}",
    )
    serve.start(
        http_options={"host": "0.0.0.0"},  # Listen on all IP addresses
        detached=True,
    )
    app_config: VllmAppConfig = {
            "vllm_config": {

This file has been truncated. show original

christina · February 3, 2025, 7:26pm

Hi! I’ve investigated this and talked to some of the Ray engineers and this is a known issue that they are currently working on fixing. In the meantime, can you try doing distributed_executor_backend="mp" and see if that fixes the issue?

pcpLiu · February 4, 2025, 1:03am

Thanks! I will try and report back the result

Topic		Replies	Views
Running vllm script on multi node cluster Ray Clusters	1	3029	February 9, 2024
vLLM Inferencing on multiGPU Ray Serve	7	1197	September 24, 2024
Ray Serve Latest version vLLM example requires code modification to work Ray Serve	7	1154	March 17, 2025
Ray Serve LLM example in document cannot work Ray Serve LLM APIs	6	306	April 3, 2025
vLLM example not working in Docker on VM	1	551	September 4, 2024

VLLM will report gpu missing on the hosting node in Ray

Context

Issue

Log

Reproduce

Related topics