Context
We are using Ray Serve to deploy a vLLM App. It was working well til recently we upgrade version of vLLM to 0.7.0 and adapted to the API change.
We have a instance with six 4090 GPUs. We deployed a Ray cluster on it with one head node and 5 worker nodes. All are docker containers. Each container is attached to a gpu.
Issue
The core issue is that whenever the vLLM app tries to load the model from the disk, it fails to find GPU to the container where the APP is hosted.
- All containers have exactly the same ENV
- We can use vllm cli to run the model directly without any issue.
- We tried to re-deploy many times. Each time the APP could be hosted on an arbitrary node. Whenever a node becomes the hosting node, it will throw error of that it does not have GPU. But in the cases when that node is not the hosted node, it can work well and load model weights smoothly.
- We are now trying to downgrade the vllm version but want to get an idea if this is a bug or our usage issue. Thanks!
Log
ValueError: Current node has no GPU available. current_node_resource={'node:172.17.0.6_group_0_5ea4bf00a38e2ed9e9af4d4e2c3d2c000000': 0.001, 'accelerator_type:G': 1.0, 'node:172.17.0.6_group_5ea4bf00a38e2ed9e9af4d4e2c3d2c000000': 0.001, 'CPU': 63.0, 'memory': 10593529856.0, 'object_store_memory': 4540084224.0, 'node:172.17.0.6': 0.999, 'bundle_group_0_5ea4bf00a38e2ed9e9af4d4e2c3d2c000000': 999.999, 'bundle_group_5ea4bf00a38e2ed9e9af4d4e2c3d2c000000': 999.999}. vLLM engine cannot start without GPU. Make sure you have at least 1 GPU available in a node current_node_id='6f6229bb91687736efbb6174c5885ad0ad7f5aa6fb53ad20afaee93a' current_ip='172.17.0.6'.
Full error log: gist:0aea4772b3273a2e9a6427c77eb25354 · GitHub
Reproduce
Python package version
vllm==0.7.0
ray==2.41.0
ray[serve]==2.41.0