How to deploy LLaMA 2 7B model with Aviary

I tried to deploy the attached .yaml file and got the following error

The deployment failed to start 12 times in a row. This may be due to a problem with its constructor or initial health check failing. See controller logs for details. Retrying after 64 seconds. Error:
e[36mray::ServeReplica:meta-llama--Llama-2-7b-chat-hf_meta-llama--Llama-2-7b-chat-hf.initialize_and_get_metadata()
  File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/home/ray/anaconda3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 413, in initialize_and_get_metadata
    raise RuntimeError(traceback.format_exc()) from None
RuntimeError: Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 403, in initialize_and_get_metadata
    await self.replica.update_user_config(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 638, in update_user_config
    await reconfigure_method(user_config)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/server/app.py", line 93, in reconfigure
    await self.predictor.rollover(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 373, in rollover
    self.new_worker_group = await self._create_worker_group(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 483, in _create_worker_group
    worker_group = await self._start_prediction_workers(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 408, in _start_prediction_workers
    await asyncio.gather(
  File "/home/ray/anaconda3/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
    return (yield from awaitable.__await__())
ray.exceptions.RayTaskError(OSError): e[36mray::PredictionWorker.init_model()e[39m (pid=2674, ip=172.31.12.249, actor_id=4e4458640ffc4aca98c8aee301000000, repr=PredictionWorker:meta-llama/Llama-2-7b-chat-hf)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tf_model.h5

The above exception was the direct cause of the following exception:

e[36mray::PredictionWorker.init_model()e[39m (pid=2674, ip=172.31.12.249, actor_id=4e4458640ffc4aca98c8aee301000000, repr=PredictionWorker:meta-llama/Llama-2-7b-chat-hf)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/utils/hub.py", line 612, in has_file
    hf_raise_for_status(r)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 293, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64b95723-259b48d308cc762a1c7783b5)

Repository Not Found for url: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/resolve/main/tf_model.h5.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

e[36mray::PredictionWorker.init_model()e[39m (pid=2674, ip=172.31.12.249, actor_id=4e4458640ffc4aca98c8aee301000000, repr=PredictionWorker:meta-llama/Llama-2-7b-chat-hf)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 95, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
    return model_class.from_pretrained(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2534, in from_pretrained
    if has_file(pretrained_model_name_or_path, TF2_WEIGHTS_NAME, **has_file_kwargs):
  File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/utils/hub.py", line 616, in has_file
    raise EnvironmentError(f"{path_or_repo} is not a local folder or a valid repository name on 'https://hf.co'.")
OSError: meta-llama/Llama-2-7b-chat-hf is not a local folder or a valid repository name on 'https://hf.co'.

During handling of the above exception, another exception occurred:

e[36mray::PredictionWorker.init_model()e[39m (pid=2674, ip=172.31.12.249, actor_id=4e4458640ffc4aca98c8aee301000000, repr=PredictionWorker:meta-llama/Llama-2-7b-chat-hf)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 130, in init_model
    self.generator = init_model(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/utils.py", line 90, in inner
    ret = func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/predictor/predictor.py", line 73, in init_model
    pipeline = get_pipeline_cls_by_name(pipeline_name).from_initializer(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/pipelines/_base.py", line 43, in from_initializer
    model, tokenizer = initializer.load(model_id)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 57, in load
    model = self.load_model(model_id)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aviary/backend/llm/initializers/hf_transformers/base.py", line 105, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
    return model_class.from_pretrained(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2449, in from_pretrained
    raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /home/ray/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/c1d3cabadba7ec7f1a9ef2ba5467ad31b3b84ff0.

meta-llama–Llama-2-7b-chat-hf.yaml File

deployment_config:
  autoscaling_config:
    min_replicas: 1
    initial_replicas: 1
    max_replicas: 8
    target_num_ongoing_requests_per_replica: 1.0
    metrics_interval_s: 10.0
    look_back_period_s: 30.0
    smoothing_factor: 1.0
    downscale_delay_s: 300.0
    upscale_delay_s: 90.0
  ray_actor_options:
    resources:
      accelerator_type_cpu: 0.01
model_config:
  model_id: meta-llama/Llama-2-7b-chat-hf
  batching: static
  initialization:
    s3_mirror_config:
      bucket_uri: s3://large-dl-models-mirror/models--meta-llama--Llama-2-7b-chat-hf/main-safetensors/
      s3_sync_args:
        - "--no-sign-request"
    initializer:
      type: DeviceMap
      dtype: bfloat16
      from_pretrained_kwargs:
        trust_remote_code: true
        use_cache: true
      use_bettertransformer: false
      torch_compile:
        backend: inductor
        mode: max-autotune
    pipeline: transformers
  generation:
    max_input_words: 800
    max_batch_size: 8
    generate_kwargs:
      do_sample: true
      max_new_tokens: 512
      min_new_tokens: 16
      top_p: 1.0
      top_k: 0
      temperature: 0.1
      repetition_penalty: 1.1
    prompt_format:
      system: "{instruction}"
      assistant: "{instruction}"
      trailing_assistant: ""
      user: "{instruction}"
      default_system_message: ""
    stopping_sequences: []
scaling_config:
  num_workers: 1
  num_gpus_per_worker: 1
  num_cpus_per_worker: 4
  resources_per_worker:
    accelerator_type_a10: 0.01