Ray Serve Latest version vLLM example requires code modification to work

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi All thanks for reading,

After launching a ray cluster in AWS via ray up on latest version, I am attempting to run the vLLM example.

https://docs.ray.io/en/latest/serve/tutorials/vllm-example.html

Serve will start and run the LLM fine. When the query.py is run against it however it has error output, the key piece being below:

(ServeReplica:default:VLLMDeployment pid=286, ip=) AttributeError: ‘str’ object has no attribute ‘name’

I was able to get it to work but only after modification in vLLM code https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_engine.py:

def _is_model_supported(self, model_name):
return any(model.name == model_name for model in self.base_model_paths)

def _is_model_supported(self, model_name):
return True

And /vllm/blob/main/vllm/entrypoints/openai/serving_chat.py:

line 265:
– model_name = self.base_model_paths[0].name
++ model_name = self.base_model_paths[0]

line 599:
– model_name = self.base_model_paths[0].name
++ model_name = self.base_model_paths[0]

After this the example is working fine and serving queries without errors or issues.

This seems to be a bug in Ray integration with the latest version of vLLM however I am very new with Ray and this is my first effort using it so I am wondering if anyone else is facing the same issue or can validate they are running the latest vLLM example from the official Ray docs without issue on latest version of Ray?

@ripTrainJudo Which vLLM version are you using?

@cindy_zhang

vllm 0.6.4.post1
ray 2.40.0

The cluster was deployed with below set for docker image:

image: “rayproject/ray-ml:2.40.0.deprecated-py39-gpu”

I also forgot to mention in order to get the example llm.py to work I had to add:

++ runtime_env = {
++ “pip”: [“vllm”]
++ }

@serve.deployment(
++ray_actor_options={“runtime_env”: runtime_env},
autoscaling_config={

Otherwise would see an error about vllm not being found on worker node. The doc did not make clear with pip install vllm was that being run on local workstation or on head node and seemed to indicate vllm would automatically be carried over to worker node without needing to specify.

I am facing the same problem @cindy_zhang . Were you able to solve it?

Have the same problem with 6.6.0 and ray 3.0.0

please check out our new APIs for LLM serving instead, these will work with latest versions of vllm - Overview — Ray 2.43.0

Is there a updated kuberay example? Been trying for a long time to get this working

an example serve yaml is here Overview — Ray 2.43.0 . To deploy with kuberay you can check out some sample CRs and follow the guide - Deploy on Kubernetes — Ray 2.43.0