Ray Serve Latest version vLLM example requires code modification to work

ripTrainJudo · December 4, 2024, 7:04pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi All thanks for reading,

After launching a ray cluster in AWS via ray up on latest version, I am attempting to run the vLLM example.

https://docs.ray.io/en/latest/serve/tutorials/vllm-example.html

Serve will start and run the LLM fine. When the query.py is run against it however it has error output, the key piece being below:

(ServeReplica:default:VLLMDeployment pid=286, ip=) AttributeError: ‘str’ object has no attribute ‘name’

I was able to get it to work but only after modification in vLLM code https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_engine.py:

def _is_model_supported(self, model_name):
return any(model.name == model_name for model in self.base_model_paths)

def _is_model_supported(self, model_name):
return True

And /vllm/blob/main/vllm/entrypoints/openai/serving_chat.py:

line 265:
– model_name = self.base_model_paths[0].name
++ model_name = self.base_model_paths[0]

line 599:
– model_name = self.base_model_paths[0].name
++ model_name = self.base_model_paths[0]

After this the example is working fine and serving queries without errors or issues.

This seems to be a bug in Ray integration with the latest version of vLLM however I am very new with Ray and this is my first effort using it so I am wondering if anyone else is facing the same issue or can validate they are running the latest vLLM example from the official Ray docs without issue on latest version of Ray?

cindy_zhang · December 5, 2024, 5:29pm

@ripTrainJudo Which vLLM version are you using?

ripTrainJudo · December 5, 2024, 7:12pm

@cindy_zhang

vllm 0.6.4.post1
ray 2.40.0

The cluster was deployed with below set for docker image:

image: “rayproject/ray-ml:2.40.0.deprecated-py39-gpu”

I also forgot to mention in order to get the example llm.py to work I had to add:

++ runtime_env = {
++ “pip”: [“vllm”]
++ }

@serve.deployment(
++ray_actor_options={“runtime_env”: runtime_env},
autoscaling_config={

Otherwise would see an error about vllm not being found on worker node. The doc did not make clear with pip install vllm was that being run on local workstation or on head node and seemed to indicate vllm would automatically be carried over to worker node without needing to specify.

gg999 · January 23, 2025, 1:00am

I am facing the same problem @cindy_zhang . Were you able to solve it?

Torstein_Thorsland_E · March 13, 2025, 6:03am

Have the same problem with 6.6.0 and ray 3.0.0

Akshay_Malik · March 14, 2025, 9:17pm

please check out our new APIs for LLM serving instead, these will work with latest versions of vllm - Overview — Ray 2.43.0

Torstein_Thorsland_E · March 14, 2025, 9:32pm

Is there a updated kuberay example? Been trying for a long time to get this working

Akshay_Malik · March 17, 2025, 4:19pm

an example serve yaml is here Overview — Ray 2.43.0 . To deploy with kuberay you can check out some sample CRs and follow the guide - Deploy on Kubernetes — Ray 2.43.0

Topic		Replies	Views
Ray Serve LLM example in document cannot work Ray Serve LLM APIs	6	172	April 3, 2025
LLM Ray Serve Problem	0	343	August 19, 2023
Ray Serve LLM application	0	350	August 21, 2023
vLLM example not working in Docker on VM	1	445	September 4, 2024
[Serve] Example script not working Ray Serve	4	562	June 3, 2021

Ray Serve Latest version vLLM example requires code modification to work

Related topics