Use deepspeed in aviary to deploy falcon 40B / Llama 30B Fails

I use deepspeed in aviary to deploy falcon 40B model,/Llama 30B But i always fails with the following message:


ERROR 2023-07-17 19:20:24,291 controller 2082882 deployment_state.py:567 - Exception in replica 'falcon_40b_chat_falcon_40b_chat#nAxRcf', the replica will be stopped.
Traceback (most recent call last):
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 565, in check_ready
    _, self._version = ray.get(self._ready_obj_ref)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/worker.py", line 2491, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): e[36mray::ServeReplica:falcon_40b_chat_falcon_40b_chat.initialize_and_get_metadata()e[39m (pid=2082996, ip=192.168.2.128, actor_id=49ec46570b34dddf6b70b02303000000, repr=<ray.serve._private.replica.ServeReplica:falcon_40b_chat_falcon_40b_chat object at 0x7ed54e033700>)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata
    raise RuntimeError(traceback.format_exc()) from None
RuntimeError: Traceback (most recent call last):
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata
    await self.replica.update_user_config(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config
    await reconfigure_method(user_config)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure
    await self.predictor.rollover(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 376, in rollover
    self.new_worker_group = await self._create_worker_group(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 486, in _create_worker_group
    worker_group = await self._start_prediction_workers(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 411, in _start_prediction_workers
    await asyncio.gather(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
    return (yield from awaitable.__await__())
ray.exceptions.RayTaskError(RuntimeError): e[36mray::PredictionWorker.init_model()e[39m (pid=2083390, ip=192.168.2.128, actor_id=a9a378e3cdb5d5e7edf8098103000000, repr=PredictionWorker:falcon_40b_chat)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 133, in init_model
    self.generator = init_model(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner
    ret = func(*args, **kwargs)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model
    pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer
    model, tokenizer = initializer.load(model_id)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 59, in load
    return self.postprocess_model(model), self.postprocess_tokenizer(tokenizer)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/deepspeed.py", line 201, in postprocess_model
    model = deepspeed.init_inference(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/__init__.py", line 333, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 167, in __init__
    self._create_model_parallel_group(config)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 264, in _create_model_parallel_group
    get_accelerator().set_device(local_rank)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/accelerator/cuda_accelerator.py", line 38, in set_device
    torch.cuda.set_device(device_index)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/torch/cuda/__init__.py", line 350, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO 2023-07-17 19:20:24,295 controller 2082882 deployment_state.py:887 - Stopping replica falcon_40b_chat_falcon_40b_chat#nAxRcf for deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:20:26,387 controller 2082882 deployment_state.py:1586 - Adding 1 replica to deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:20:26,388 controller 2082882 deployment_state.py:331 - Starting replica falcon_40b_chat_falcon_40b_chat#DGCaSN for deployment falcon_40b_chat_falcon_40b_chat.
WARNING 2023-07-17 19:20:56,468 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:21:26,517 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:21:56,582 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:22:26,663 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:22:56,715 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
ERROR 2023-07-17 19:22:58,584 controller 2082882 deployment_state.py:567 - Exception in replica 'falcon_40b_chat_falcon_40b_chat#DGCaSN', the replica will be stopped.
Traceback (most recent call last):
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 565, in check_ready
    _, self._version = ray.get(self._ready_obj_ref)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/worker.py", line 2491, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): e[36mray::ServeReplica:falcon_40b_chat_falcon_40b_chat.initialize_and_get_metadata()e[39m (pid=2083757, ip=192.168.2.128, actor_id=d5f4c1b26cdb989833b0792a03000000, repr=<ray.serve._private.replica.ServeReplica:falcon_40b_chat_falcon_40b_chat object at 0x7f78c4a1b730>)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata
    raise RuntimeError(traceback.format_exc()) from None
RuntimeError: Traceback (most recent call last):
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata
    await self.replica.update_user_config(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config
    await reconfigure_method(user_config)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure
    await self.predictor.rollover(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 376, in rollover
    self.new_worker_group = await self._create_worker_group(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 486, in _create_worker_group
    worker_group = await self._start_prediction_workers(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 411, in _start_prediction_workers
    await asyncio.gather(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
    return (yield from awaitable.__await__())
ray.exceptions.RayTaskError(RuntimeError): e[36mray::PredictionWorker.init_model()e[39m (pid=2084205, ip=192.168.2.128, actor_id=08a1c6347c1939f1e94644c703000000, repr=PredictionWorker:falcon_40b_chat)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 133, in init_model
    self.generator = init_model(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner
    ret = func(*args, **kwargs)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model
    pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer
    model, tokenizer = initializer.load(model_id)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 59, in load
    return self.postprocess_model(model), self.postprocess_tokenizer(tokenizer)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/deepspeed.py", line 201, in postprocess_model
    model = deepspeed.init_inference(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/__init__.py", line 333, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 167, in __init__
    self._create_model_parallel_group(config)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 264, in _create_model_parallel_group
    get_accelerator().set_device(local_rank)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/accelerator/cuda_accelerator.py", line 38, in set_device
    torch.cuda.set_device(device_index)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/torch/cuda/__init__.py", line 350, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO 2023-07-17 19:22:58,585 controller 2082882 deployment_state.py:887 - Stopping replica falcon_40b_chat_falcon_40b_chat#DGCaSN for deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:23:00,647 controller 2082882 deployment_state.py:1586 - Adding 1 replica to deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:23:00,647 controller 2082882 deployment_state.py:331 - Starting replica falcon_40b_chat_falcon_40b_chat#zuMDQO for deployment falcon_40b_chat_falcon_40b_chat.
WARNING 2023-07-17 19:23:30,680 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:24:00,703 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:24:30,773 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:25:00,792 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:25:30,906 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
ERROR 2023-07-17 19:25:53,961 controller 2082882 deployment_state.py:567 - Exception in replica 'falcon_40b_chat_falcon_40b_chat#zuMDQO', the replica will be stopped.
Traceback (most recent call last):
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 565, in check_ready
    _, self._version = ray.get(self._ready_obj_ref)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/_private/worker.py", line 2491, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): e[36mray::ServeReplica:falcon_40b_chat_falcon_40b_chat.initialize_and_get_metadata()e[39m (pid=2084593, ip=192.168.2.128, actor_id=a32685edfc2783a6e1acb04003000000, repr=<ray.serve._private.replica.ServeReplica:falcon_40b_chat_falcon_40b_chat object at 0x7f21c6c97790>)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata
    raise RuntimeError(traceback.format_exc()) from None
RuntimeError: Traceback (most recent call last):
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata
    await self.replica.update_user_config(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config
    await reconfigure_method(user_config)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure
    await self.predictor.rollover(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 376, in rollover
    self.new_worker_group = await self._create_worker_group(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 486, in _create_worker_group
    worker_group = await self._start_prediction_workers(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 411, in _start_prediction_workers
    await asyncio.gather(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
    return (yield from awaitable.__await__())
ray.exceptions.RayTaskError(RuntimeError): e[36mray::PredictionWorker.init_model()e[39m (pid=2089394, ip=192.168.2.128, actor_id=03ac8570e59702d521b998bd03000000, repr=PredictionWorker:falcon_40b_chat)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 133, in init_model
    self.generator = init_model(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner
    ret = func(*args, **kwargs)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model
    pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer
    model, tokenizer = initializer.load(model_id)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 59, in load
    return self.postprocess_model(model), self.postprocess_tokenizer(tokenizer)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/deepspeed.py", line 201, in postprocess_model
    model = deepspeed.init_inference(
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/__init__.py", line 333, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 167, in __init__
    self._create_model_parallel_group(config)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 264, in _create_model_parallel_group
    get_accelerator().set_device(local_rank)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/deepspeed/accelerator/cuda_accelerator.py", line 38, in set_device
    torch.cuda.set_device(device_index)
  File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/torch/cuda/__init__.py", line 350, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

INFO 2023-07-17 19:34:03,620 controller 2082882 deployment_state.py:887 - Stopping replica falcon_40b_chat_falcon_40b_chat#WpwJfy for deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:34:05,678 controller 2082882 deployment_state.py:1586 - Adding 1 replica to deployment falcon_40b_chat_falcon_40b_chat.
INFO 2023-07-17 19:34:05,678 controller 2082882 deployment_state.py:331 - Starting replica falcon_40b_chat_falcon_40b_chat#sGPILL for deployment falcon_40b_chat_falcon_40b_chat.
WARNING 2023-07-17 19:34:35,682 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:35:05,719 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.
WARNING 2023-07-17 19:35:35,746 controller 2082882 deployment_state.py:1909 - Deployment falcon_40b_chat_falcon_40b_chat has 1 replicas that have taken more than 30s to initialize. This may be caused by a slow __init__ or reconfigure method.


Any suggession?

@allwefantasy Thanks for posting. Directing to the aviary folks

cc: @Yard1 Any pointers here?

This seems like an issue with CUDA not detecting GPUs. Are you running this using the Docker image & Ray Cluster Launcher config from the aviary repo?

I guess the reason is like this:

deepspeed will use rank as the device id, and the
ray will automatically set CUDA_VISIBLE_DEVICES for each worker according to the num_gpus specified . for example, suppose we have 0,1,2,4 gpus, and we have 4 workers, then the CUDA_VISIBLE_DEVICES=[3]
for the last worker, and the deepspeed worker will use 3 as the device id, e.g. torch.cuda.set_device(3) but the CUDA_VISIBLE_DEVICES=3 so the exception will throws.