Shape mismatch error on ray serve vllm qwen2-vl with ChatCompletionRequest

javasy · January 23, 2025, 8:30am

Parallel requests to a ray serve ‘OpenAI Chat Completions API’ based on this instruction: Serve a Large Language Model with vLLM — Ray 2.41.0

The model is qwen2-vl, and the request contains both text and image.

It is normal when call one request at one time, but get error when parallel requesting with ‘max_ongoing_requests >= 2’.

The error stack shows below：

ERROR 2025-01-23 00:22:21,963 vl_VLLMDeployment 4gtvteb2 e1d433cc-e551-4e5e-b10e-986dea9fe1ad /v1/chat/completions llm.py:128 - Error in generate()
Traceback (most recent call last):
File “/home/ray/anaconda3/lib/python3.9/site-packages/vllm/worker/model_runner_base.py”, line 116, in _wrapper
return func(*args, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/vllm/worker/model_runner.py”, line 1654, in execute_model
hidden_or_intermediate_states = model_executable(
File “/home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
return forward_call(*args, **kwargs)
File “/home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/models/qwen2_vl.py”, line 1287, in forward
inputs_embeds = self._merge_multimodal_embeddings(
File “/home/ray/anaconda3/lib/python3.9/site-packages/vllm/model_executor/models/qwen2_vl.py”, line 1237, in _merge_multimodal_embeddings
inputs_embeds[mask, :] = multimodal_embeddings
RuntimeError: shape mismatch: value tensor of shape [644, 3584] cannot be broadcast to indexing result of shape [322, 3584]

Topic		Replies	Views
ImportError: cannot import name 'Tensor' from 'torch' (unknown location)? Ray Clusters	0	695	November 23, 2024
LLM Ray Serve Problem	0	330	August 19, 2023
Ray serve blocking requests when serving an LLM Ray Serve	3	89	October 20, 2024
When I make multiple concurrent requests, the program reports an error, but there is no problem with a single program	1	23	February 19, 2025
Ray Serve LLM application	0	342	August 21, 2023

Shape mismatch error on ray serve vllm qwen2-vl with ChatCompletionRequest

Related topics