I tried to deploy the attached .yaml file and got the following error
The deployment failed to start 12 times in a row. This may be due to a problem with its constructor or initial health check failing. See controller logs for details. Retrying after 64 seconds. Error:
e[36mray::ServeReplica:meta-llama--Llama-2-7b-chat-hf_meta-llama--Llama-2-7b-chat-hf.initialize_and_get_metadata()
File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata
raise RuntimeError(traceback.format_exc()) from None
RuntimeError: Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata
await self.replica.update_user_config(
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config
await reconfigure_method(user_config)
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure
await self.predictor.rollover(
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 373, in rollover
self.new_worker_group = await self._create_worker_group(
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 483, in _create_worker_group
worker_group = await self._start_prediction_workers(
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 408, in _start_prediction_workers
await asyncio.gather(
File "/home/ray/anaconda3/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
return (yield from awaitable.__await__())
ray.exceptions.RayTaskError(OSError): e[36mray::PredictionWorker.init_model()e[39m (pid=2674, ip=172.31.12.249, actor_id=4e4458640ffc4aca98c8aee301000000, repr=PredictionWorker:meta-llama/Llama-2-7b-chat-hf)
File "/home/ray/anaconda3/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tf_model.h5
The above exception was the direct cause of the following exception:
e[36mray::PredictionWorker.init_model()e[39m (pid=2674, ip=172.31.12.249, actor_id=4e4458640ffc4aca98c8aee301000000, repr=PredictionWorker:meta-llama/Llama-2-7b-chat-hf)
File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/utils/hub.py", line 612, in has_file
hf_raise_for_status(r)
File "/home/ray/anaconda3/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 293, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64b95723-259b48d308cc762a1c7783b5)
Repository Not Found for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tf_model.h5.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
During handling of the above exception, another exception occurred:
e[36mray::PredictionWorker.init_model()e[39m (pid=2674, ip=172.31.12.249, actor_id=4e4458640ffc4aca98c8aee301000000, repr=PredictionWorker:meta-llama/Llama-2-7b-chat-hf)
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 95, in load_model
model = AutoModelForCausalLM.from_pretrained(
File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
return model_class.from_pretrained(
File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2534, in from_pretrained
if has_file(pretrained_model_name_or_path, TF2_WEIGHTS_NAME, **has_file_kwargs):
File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/utils/hub.py", line 616, in has_file
raise EnvironmentError(f"{path_or_repo} is not a local folder or a valid repository name on 'https://hf.co'.")
OSError: meta-llama/Llama-2-7b-chat-hf is not a local folder or a valid repository name on 'https://hf.co'.
During handling of the above exception, another exception occurred:
e[36mray::PredictionWorker.init_model()e[39m (pid=2674, ip=172.31.12.249, actor_id=4e4458640ffc4aca98c8aee301000000, repr=PredictionWorker:meta-llama/Llama-2-7b-chat-hf)
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 130, in init_model
self.generator = init_model(
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner
ret = func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model
pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer(
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer
model, tokenizer = initializer.load(model_id)
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 57, in load
model = self.load_model(model_id)
File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 105, in load_model
model = AutoModelForCausalLM.from_pretrained(
File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
return model_class.from_pretrained(
File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2449, in from_pretrained
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /home/ray/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/c1d3cabadba7ec7f1a9ef2ba5467ad31b3b84ff0.
meta-llama–Llama-2-7b-chat-hf.yaml File
deployment_config:
autoscaling_config:
min_replicas: 1
initial_replicas: 1
max_replicas: 8
target_num_ongoing_requests_per_replica: 1.0
metrics_interval_s: 10.0
look_back_period_s: 30.0
smoothing_factor: 1.0
downscale_delay_s: 300.0
upscale_delay_s: 90.0
ray_actor_options:
resources:
accelerator_type_cpu: 0.01
model_config:
model_id: meta-llama/Llama-2-7b-chat-hf
batching: static
initialization:
s3_mirror_config:
bucket_uri: s3://large-dl-models-mirror/models--meta-llama--Llama-2-7b-chat-hf/main-safetensors/
s3_sync_args:
- "--no-sign-request"
initializer:
type: DeviceMap
dtype: bfloat16
from_pretrained_kwargs:
trust_remote_code: true
use_cache: true
use_bettertransformer: false
torch_compile:
backend: inductor
mode: max-autotune
pipeline: transformers
generation:
max_input_words: 800
max_batch_size: 8
generate_kwargs:
do_sample: true
max_new_tokens: 512
min_new_tokens: 16
top_p: 1.0
top_k: 0
temperature: 0.1
repetition_penalty: 1.1
prompt_format:
system: "{instruction}"
assistant: "{instruction}"
trailing_assistant: ""
user: "{instruction}"
default_system_message: ""
stopping_sequences: []
scaling_config:
num_workers: 1
num_gpus_per_worker: 1
num_cpus_per_worker: 4
resources_per_worker:
accelerator_type_a10: 0.01