1. Severity of the issue: (select one)
High: Completely blocks me.
2. Environment:
- Ray version: 2.46.0
- Python version: 3.10.16
- OS: Ubuntu 22.04 LTS
- Cloud/Infrastructure: Azure VMSS
- Other libs/tools (if relevant):
- VLLM Version : 0.8.5.post1
Hi I am trying to host Deepseek R1 as per this tutorial Serve DeepSeek — Ray 2.46.0.
Below is the config I have used for deploying model
http_options:
host: 0.0.0.0
port: 22300
applications:
- args:
llm_configs:
- model_loading_config:
model_id: deepseek-ai/DeepSeek-R1
model_source: /lustrefs/path_to_model/hf_hub/DeepSeek-R1
deployment_config:
autoscaling_config:
min_replicas: 1
max_replicas: 1
runtime_env:
env_vars:
VLLM_USE_V1: "0"
engine_kwargs:
tensor_parallel_size: 8
pipeline_parallel_size: 2
gpu_memory_utilization: 0.8
dtype: "auto"
max_num_seqs: 20
max_model_len: 8192
enable_chunked_prefill: true
enable_prefix_caching: true
trust_remote_code: false
import_path: ray.serve.llm:build_openai_app
name: deepseek
route_prefix: "/"
I am using 2 nodes with 8 H100 GPU’s on each node, however If I use above config to deploy I am came accross folllowing error.
File "/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1714, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File "/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 226, in _tcp_rendezvous_handler
store = _create_c10d_store(
File "/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/torch/distributed/rendezvous.py", line 194, in _create_c10d_store
return TCPStore(
torch.distributed.DistNetworkError: The client socket has timed out after 600000ms while trying to connect to (10.199.1.40, 60605).
and If I set VLLM_USE_V1 to true within deployment config I see the following issue however I am able to serve models that fits on a single GPU.
This may be caused by a slow __init__ or reconfigure method.
ERROR 2025-05-25 15:39:01,027 controller 1599002 -- Exception in Replica(id='wsxcezc4', deployment='LLMDeployment:DeepSeek-R1', app='deepseek'), the replica will be stopped.
Traceback (most recent call last):
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 694, in check_ready
) = ray.get(self._ready_obj_ref)
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): e[36mray::ServeReplica:deepseek:LLMDeployment:DeepSeek-R1.initialize_and_get_metadata()e[39m (pid=1620132, ip=10.xxx.x.40, actor_id=35bac4d9f170d2e4a246035405000000, repr=<ray.serve._private.replica.ServeReplica:deepseek:LLMDeployment:DeepSeek-R1 object at 0x14719acf20e0>)
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 984, in initialize_and_get_metadata
await self._replica_impl.initialize(deployment_config)
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 713, in initialize
raise RuntimeError(traceback.format_exc()) from None
RuntimeError: Traceback (most recent call last):
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 690, in initialize
self._user_callable_asgi_app = await asyncio.wrap_future(
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 1384, in initialize_callable
await self._call_func_or_gen(
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 1347, in _call_func_or_gen
result = await result
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/llm/_internal/serve/deployments/llm/llm_server.py", line 440, in __init__
await asyncio.wait_for(self._start_engine(), timeout=ENGINE_START_TIMEOUT_S)
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/llm/_internal/serve/deployments/llm/llm_server.py", line 486, in _start_engine
await self.engine.start()
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py", line 232, in start
self.engine = await self._start_engine()
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py", line 272, in _start_engine
return await self._start_engine_v1()
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py", line 350, in _start_engine_v1
return self._start_async_llm_engine(
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py", line 464, in _start_async_llm_engine
return vllm.engine.async_llm_engine.AsyncLLMEngine(
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 118, in __init__
self.engine_core = core_client_class(
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 642, in __init__
super().__init__(
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 398, in __init__
self._wait_for_engine_startup()
File "/dnc/arshad.shaikh/miniconda3/envs/rayenv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 430, in _wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
This issue has completely blocked me and my team from using Ray with multi-node setup, Can someone please help me out with this issue.
Regards,
Arshad