I use deepspeed in aviary to deploy falcon 40B model,/Llama 30B But i always fails with the following message:
ERROR 2023-07-17 19:20:24,291 controller 2082882 deployment_state.py:567 - Exception in replica 'falcon_40b_chat_falcon_40b_chat#nAxRcf', the replica will be stopped.
Traceback (most recent call last):
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 565, in check_ready
_, self._version = ray.get(self._ready_obj_ref)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/worker.py", line 2491, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): e[36mray::ServeReplica:falcon_40b_chat_falcon_40b_chat.initialize_and_get_metadata()e[39m (pid=2082996, ip=192.168.2.128, actor_id=49ec46570b34dddf6b70b02303000000, repr=<ray.serve._private.replica.ServeReplica:falcon_40b_chat_falcon_40b_chat object at 0x7ed54e033700>)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata
raise RuntimeError(traceback.format_exc()) from None
RuntimeError: Traceback (most recent call last):
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata
await self.replica.update_user_config(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config
await reconfigure_method(user_config)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure
await self.predictor.rollover(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 376, in rollover
self.new_worker_group = await self._create_worker_group(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 486, in _create_worker_group
worker_group = await self._start_prediction_workers(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 411, in _start_prediction_workers
await asyncio.gather(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
return (yield from awaitable.__await__())
ray.exceptions.RayTaskError(RuntimeError): e[36mray::PredictionWorker.init_model()e[39m (pid=2083390, ip=192.168.2.128, actor_id=a9a378e3cdb5d5e7edf8098103000000, repr=PredictionWorker:falcon_40b_chat)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 133, in init_model
self.generator = init_model(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner
ret = func(*args, **kwargs)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model
pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer
model, tokenizer = initializer.load(model_id)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 59, in load
return self.postprocess_model(model), self.postprocess_tokenizer(tokenizer)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/deepspeed.py", line 201, in postprocess_model
model = deepspeed.init_inference(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/__init__.py", line 333, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 167, in __init__
self._create_model_parallel_group(config)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 264, in _create_model_parallel_group
get_accelerator().set_device(local_rank)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/accelerator/cuda_accelerator.py", line 38, in set_device
torch.cuda.set_device(device_index)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/torch/cuda/__init__.py", line 350, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO 2023-07-17 19:20:24,295 controller 2082882 deployment_state.py:887 - Stopping replica falcon_40b_chat_falcon_40b_chat#nAxRcf for deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:20:26,387 controller 2082882 deployment_state.py:1586 - Adding 1 replica to deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:20:26,388 controller 2082882 deployment_state.py:331 - Starting replica falcon_40b_chat_falcon_40b_chat#DGCaSN for deployment falcon_40b_chat_falcon_40b_chat.
WARNING 2023-07-17 19:20:56,468 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:21:26,517 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:21:56,582 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:22:26,663 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:22:56,715 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
ERROR 2023-07-17 19:22:58,584 controller 2082882 deployment_state.py:567 - Exception in replica 'falcon_40b_chat_falcon_40b_chat#DGCaSN', the replica will be stopped.
Traceback (most recent call last):
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 565, in check_ready
_, self._version = ray.get(self._ready_obj_ref)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/worker.py", line 2491, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): e[36mray::ServeReplica:falcon_40b_chat_falcon_40b_chat.initialize_and_get_metadata()e[39m (pid=2083757, ip=192.168.2.128, actor_id=d5f4c1b26cdb989833b0792a03000000, repr=<ray.serve._private.replica.ServeReplica:falcon_40b_chat_falcon_40b_chat object at 0x7f78c4a1b730>)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata
raise RuntimeError(traceback.format_exc()) from None
RuntimeError: Traceback (most recent call last):
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata
await self.replica.update_user_config(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config
await reconfigure_method(user_config)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure
await self.predictor.rollover(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 376, in rollover
self.new_worker_group = await self._create_worker_group(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 486, in _create_worker_group
worker_group = await self._start_prediction_workers(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 411, in _start_prediction_workers
await asyncio.gather(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
return (yield from awaitable.__await__())
ray.exceptions.RayTaskError(RuntimeError): e[36mray::PredictionWorker.init_model()e[39m (pid=2084205, ip=192.168.2.128, actor_id=08a1c6347c1939f1e94644c703000000, repr=PredictionWorker:falcon_40b_chat)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 133, in init_model
self.generator = init_model(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner
ret = func(*args, **kwargs)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model
pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer
model, tokenizer = initializer.load(model_id)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 59, in load
return self.postprocess_model(model), self.postprocess_tokenizer(tokenizer)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/deepspeed.py", line 201, in postprocess_model
model = deepspeed.init_inference(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/__init__.py", line 333, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 167, in __init__
self._create_model_parallel_group(config)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 264, in _create_model_parallel_group
get_accelerator().set_device(local_rank)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/accelerator/cuda_accelerator.py", line 38, in set_device
torch.cuda.set_device(device_index)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/torch/cuda/__init__.py", line 350, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO 2023-07-17 19:22:58,585 controller 2082882 deployment_state.py:887 - Stopping replica falcon_40b_chat_falcon_40b_chat#DGCaSN for deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:23:00,647 controller 2082882 deployment_state.py:1586 - Adding 1 replica to deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:23:00,647 controller 2082882 deployment_state.py:331 - Starting replica falcon_40b_chat_falcon_40b_chat#zuMDQO for deployment falcon_40b_chat_falcon_40b_chat.
WARNING 2023-07-17 19:23:30,680 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:24:00,703 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:24:30,773 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:25:00,792 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:25:30,906 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
ERROR 2023-07-17 19:25:53,961 controller 2082882 deployment_state.py:567 - Exception in replica 'falcon_40b_chat_falcon_40b_chat#zuMDQO', the replica will be stopped.
Traceback (most recent call last):
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 565, in check_ready
_, self._version = ray.get(self._ready_obj_ref)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/worker.py", line 2491, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): e[36mray::ServeReplica:falcon_40b_chat_falcon_40b_chat.initialize_and_get_metadata()e[39m (pid=2084593, ip=192.168.2.128, actor_id=a32685edfc2783a6e1acb04003000000, repr=<ray.serve._private.replica.ServeReplica:falcon_40b_chat_falcon_40b_chat object at 0x7f21c6c97790>)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata
raise RuntimeError(traceback.format_exc()) from None
RuntimeError: Traceback (most recent call last):
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata
await self.replica.update_user_config(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config
await reconfigure_method(user_config)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure
await self.predictor.rollover(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 376, in rollover
self.new_worker_group = await self._create_worker_group(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 486, in _create_worker_group
worker_group = await self._start_prediction_workers(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 411, in _start_prediction_workers
await asyncio.gather(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
return (yield from awaitable.__await__())
ray.exceptions.RayTaskError(RuntimeError): e[36mray::PredictionWorker.init_model()e[39m (pid=2089394, ip=192.168.2.128, actor_id=03ac8570e59702d521b998bd03000000, repr=PredictionWorker:falcon_40b_chat)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 133, in init_model
self.generator = init_model(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner
ret = func(*args, **kwargs)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model
pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer
model, tokenizer = initializer.load(model_id)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 59, in load
return self.postprocess_model(model), self.postprocess_tokenizer(tokenizer)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/deepspeed.py", line 201, in postprocess_model
model = deepspeed.init_inference(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/__init__.py", line 333, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 167, in __init__
self._create_model_parallel_group(config)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 264, in _create_model_parallel_group
get_accelerator().set_device(local_rank)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/accelerator/cuda_accelerator.py", line 38, in set_device
torch.cuda.set_device(device_index)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/torch/cuda/__init__.py", line 350, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO 2023-07-17 19:34:03,620 controller 2082882 deployment_state.py:887 - Stopping replica falcon_40b_chat_falcon_40b_chat#WpwJfy for deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:34:05,678 controller 2082882 deployment_state.py:1586 - Adding 1 replica to deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:34:05,678 controller 2082882 deployment_state.py:331 - Starting replica falcon_40b_chat_falcon_40b_chat#sGPILL for deployment falcon_40b_chat_falcon_40b_chat.
WARNING 2023-07-17 19:34:35,682 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:35:05,719 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:35:35,746 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
Any suggession?