I tried to implement RAG via Llama index using llama2 7b chat model from hugging face and served it using serve but got below error.
Error
2023-10-21 12:22:53,393 INFO scripts.py:471 – Running import path: ‘serve_llm:deployment’.
Matplotlib is building the font cache; this may take a moment.
2023-10-21 12:23:38,280 INFO worker.py:1633 – Started a local Ray instance. View the dashboard at 127.0.0.1:8265
(HTTPProxyActor pid=16828) INFO 2023-10-21 12:23:44,774 http_proxy 172.16.75.70 http_proxy.py:1433 - Proxy actor d9fb123b33148219362566d801000000 starting on node e62276f67d903f9e32eca47de663c9f0530ebb282461820d8a6e1440.
(HTTPProxyActor pid=16828) INFO 2023-10-21 12:23:44,784 http_proxy 172.16.75.70 http_proxy.py:1617 - Starting HTTP server on node: e62276f67d903f9e32eca47de663c9f0530ebb282461820d8a6e1440 listening on port 8000
(ServeController pid=16776) INFO 2023-10-21 12:23:44,831 controller 16776 deployment_state.py:1390 - Deploying new version of deployment DeployLLM in application ‘default’.
(HTTPProxyActor pid=16828) INFO: Started server process [16828]
(ServeController pid=16776) INFO 2023-10-21 12:23:44,934 controller 16776 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application ‘default’.
Downloading (…)lve/main/config.json: 100%|██████████| 614/614 [00:00<00:00, 4.44MB/s]
(ServeController pid=16776) ERROR 2023-10-21 12:23:54,497 controller 16776 deployment_state.py:617 - Exception in replica ‘default#DeployLLM#PpRhvm’, the replica will be stopped.
(ServeController pid=16776) Traceback (most recent call last):
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py”, line 615, in check_ready
(ServeController pid=16776) _, self._version = ray.get(self._ready_obj_ref)
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py”, line 24, in auto_init_wrapper
(ServeController pid=16776) return fn(*args, **kwargs)
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py”, line 103, in wrapper
(ServeController pid=16776) return func(*args, **kwargs)
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py”, line 2547, in get
(ServeController pid=16776) raise value.as_instanceof_cause()
(ServeController pid=16776) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=16868, ip=172.16.75.70, actor_id=8ad79b5144dd60d2ec6965fc01000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7f664988d750>)
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py”, line 451, in result
(ServeController pid=16776) return self.__get_result()
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py”, line 403, in __get_result
(ServeController pid=16776) raise self._exception
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py”, line 442, in initialize_and_get_metadata
(ServeController pid=16776) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=16776) RuntimeError: Traceback (most recent call last):
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py”, line 430, in initialize_and_get_metadata
(ServeController pid=16776) await self._initialize_replica()
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py”, line 190, in initialize_replica
(ServeController pid=16776) await sync_to_async(_callable.init)(*init_args, **init_kwargs)
(ServeController pid=16776) File “/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py”, line 10, in init
(ServeController pid=16776) llm=prepare_model()
(ServeController pid=16776) File “/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py”, line 62, in prepare_model
(ServeController pid=16776) llm = HuggingFaceLLM(
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py”, line 131, in init
(ServeController pid=16776) self._model = model or AutoModelForCausalLM.from_pretrained(
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 565, in from_pretrained
(ServeController pid=16776) return model_class.from_pretrained(
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 2634, in from_pretrained
(ServeController pid=16776) raise ImportError(
(ServeController pid=16776) ImportError: Using load_in_8bit=True
requires Accelerate: pip install accelerate
and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes
or pip install bitsandbytes(ServeController pid=16776) INFO 2023-10-21 12:23:54,603 controller 16776 deployment_state.py:2027 - Replica default#DeployLLM#PpRhvm is stopped. (ServeController pid=16776) INFO 2023-10-21 12:23:54,604 controller 16776 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application 'default'. (ServeController pid=16776) ERROR 2023-10-21 12:24:02,307 controller 16776 deployment_state.py:617 - Exception in replica 'default#DeployLLM#PNVPBJ', the replica will be stopped. (ServeController pid=16776) Traceback (most recent call last): (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py", line 615, in check_ready (ServeController pid=16776) _, self._version = ray.get(self._ready_obj_ref) (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper (ServeController pid=16776) return fn(*args, **kwargs) (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper (ServeController pid=16776) return func(*args, **kwargs) (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get (ServeController pid=16776) raise value.as_instanceof_cause() (ServeController pid=16776) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=17053, ip=172.16.75.70, actor_id=608647c3b2634de3f09ab06701000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7f529014d7e0>) (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 451, in result (ServeController pid=16776) return self.__get_result() (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result (ServeController pid=16776) raise self._exception (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 442, in initialize_and_get_metadata (ServeController pid=16776) raise RuntimeError(traceback.format_exc()) from None (ServeController pid=16776) RuntimeError: Traceback (most recent call last): (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 430, in initialize_and_get_metadata (ServeController pid=16776) await self._initialize_replica() (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py", line 190, in initialize_replica (ServeController pid=16776) await sync_to_async(_callable.__init__)(*init_args, **init_kwargs) (ServeController pid=16776) File "/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py", line 10, in __init__ (ServeController pid=16776) llm=prepare_model() (ServeController pid=16776) File "/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py", line 62, in prepare_model (ServeController pid=16776) llm = HuggingFaceLLM( (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 131, in __init__ (ServeController pid=16776) self._model = model or AutoModelForCausalLM.from_pretrained( (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained (ServeController pid=16776) return model_class.from_pretrained( (ServeController pid=16776) File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in from_pretrained (ServeController pid=16776) raise ImportError( (ServeController pid=16776) ImportError: Using
load_in_8bit=Truerequires Accelerate:
pip install accelerateand the latest version of bitsandbytes
pip install -i Simple index bitsandbytes or pip install bitsandbytes
(ServeController pid=16776) INFO 2023-10-21 12:24:02,413 controller 16776 deployment_state.py:2027 - Replica default#DeployLLM#PNVPBJ is stopped.
(ServeController pid=16776) INFO 2023-10-21 12:24:02,413 controller 16776 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application ‘default’.
(ServeController pid=16776) ERROR 2023-10-21 12:24:10,013 controller 16776 deployment_state.py:617 - Exception in replica ‘default#DeployLLM#VbvoJE’, the replica will be stopped.
(ServeController pid=16776) Traceback (most recent call last):
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/deployment_state.py”, line 615, in check_ready
(ServeController pid=16776) _, self._version = ray.get(self._ready_obj_ref)
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py”, line 24, in auto_init_wrapper
(ServeController pid=16776) return fn(*args, **kwargs)
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py”, line 103, in wrapper
(ServeController pid=16776) return func(*args, **kwargs)
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/_private/worker.py”, line 2547, in get
(ServeController pid=16776) raise value.as_instanceof_cause()
(ServeController pid=16776) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:default:DeployLLM.initialize_and_get_metadata() (pid=17213, ip=172.16.75.70, actor_id=7f50bed52c98e7e4ace37f1601000000, repr=<ray.serve._private.replica.ServeReplica:default:DeployLLM object at 0x7fd75c7a97b0>)
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py”, line 451, in result
(ServeController pid=16776) return self.__get_result()
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py”, line 403, in __get_result
(ServeController pid=16776) raise self._exception
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py”, line 442, in initialize_and_get_metadata
(ServeController pid=16776) raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=16776) RuntimeError: Traceback (most recent call last):
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py”, line 430, in initialize_and_get_metadata
(ServeController pid=16776) await self._initialize_replica()
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/replica.py”, line 190, in initialize_replica
(ServeController pid=16776) await sync_to_async(_callable.init)(*init_args, **init_kwargs)
(ServeController pid=16776) File “/home/ec2-user/SageMaker/multi_source_chatbot/./serve_llm.py”, line 10, in init
(ServeController pid=16776) llm=prepare_model()
(ServeController pid=16776) File “/home/ec2-user/SageMaker/multi_source_chatbot/initialize_query_engine.py”, line 62, in prepare_model
(ServeController pid=16776) llm = HuggingFaceLLM(
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/llama_index/llms/huggingface.py”, line 131, in init
(ServeController pid=16776) self._model = model or AutoModelForCausalLM.from_pretrained(
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 565, in from_pretrained
(ServeController pid=16776) return model_class.from_pretrained(
(ServeController pid=16776) File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 2634, in from_pretrained
(ServeController pid=16776) raise ImportError(
(ServeController pid=16776) ImportError: Using load_in_8bit=True
requires Accelerate: pip install accelerate
and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes
or pip install bitsandbytes`
(ServeController pid=16776) WARNING 2023-10-21 12:24:10,017 controller 16776 application_state.py:663 - The deployments [‘DeployLLM’] are UNHEALTHY.
(ServeController pid=16776) INFO 2023-10-21 12:24:10,119 controller 16776 deployment_state.py:2027 - Replica default#DeployLLM#VbvoJE is stopped.
(ServeController pid=16776) INFO 2023-10-21 12:24:10,120 controller 16776 deployment_state.py:1679 - Adding 1 replica to deployment DeployLLM in application ‘default’.
(ServeController pid=16776) WARNING 2023-10-21 12:24:10,184 controller 16776 application_state.py:663 - The deployments [‘DeployLLM’] are UNHEALTHY.
(ServeController pid=16776) WARNING 2023-10-21 12:24:10,287 controller 16776 application_state.py:663 - The deployments [‘DeployLLM’] are UNHEALTHY.
(ServeController pid=16776) WARNING 2023-10-21 12:24:10,389 controller 16776 application_state.py:663 - The deployments [‘DeployLLM’] are UNHEALTHY.
(ServeController pid=16776) WARNING 2023-10-21 12:24:10,491 controller 16776 application_state.py:663 - The deployments [‘DeployLLM’] are UNHEALTHY.
(ServeController pid=16776) WARNING 2023-10-21 12:24:10,594 controller 16776 application_state.py:663 - The deployments [‘DeployLLM’] are UNHEALTHY.
(ServeController pid=16776) WARNING 2023-10-21 12:24:10,696 controller 16776 application_state.py:663 - The deployments [‘DeployLLM’] are UNHEALTHY.
(ServeController pid=16776) WARNING 2023-10-21 12:24:10,798 controller 16776 application_state.py:663 - The deployments [‘DeployLLM’] are UNHEALTHY.
Traceback (most recent call last):
File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/scripts.py”, line 518, in run
handle = serve.run(app, host=host, port=port)
File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/api.py”, line 574, in run
client.deploy_application(
File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py”, line 47, in check
return f(self, *args, **kwargs)
File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py”, line 330, in deploy_application
self._wait_for_application_running(name)
File “/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/ray/serve/_private/client.py”, line 255, in _wait_for_application_running
raise RuntimeError(
RuntimeError: Deploying application default failed: The deployments [‘DeployLLM’] are UNHEALTHY.
2023-10-21 12:24:10,917 ERR scripts.py:564 – Received unexpected error, see console logs for more details. Shutting down…
(ServeController pid=16776) WARNING 2023-10-21 12:24:10,901 controller 16776 application_state.py:663 - The deployments [‘DeployLLM’] are UNHEALTHY.
(ServeController pid=16776) INFO 2023-10-21 12:24:11,035 controller 16776 deployment_state.py:1707 - Removing 1 replica from deployment ‘DeployLLM’ in application ‘default’.
(ServeController pid=16776) INFO 2023-10-21 12:24:17,201 controller 16776 deployment_state.py:2027 - Replica default#DeployLLM#hMLjHJ is stopped.
Deployment Code
@serve.deployment
class DeployLLM:
def init(self):
llm=prepare_model()
document_loader()
storage_context = StorageContext.from_defaults(persist_dir=“doc_index”)
# Load index from the storage context
new_index = load_index_from_storage(storage_context)
self.query_engine = new_index.as_query_engine()
def run_index(self,prompt):
return self.query_engine.query(query)
async def __call__(self,request:Request):
query=request.query_params("text")
response=self.run_index(query)
return response["text"]
deployment = DeployLLM.bind()
Loading Llama2 model using llamaindex wrapper
compute_dtype=getattr(torch,“float16”)
bnb_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=True,
)
model_name="meta-llama/Llama-2-7b-chat-hf"
system_prompt ="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as
helpfully as possible, while being safe. Your answers should not include
any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain
why instead of answering something not correct. If you don't know the answer
to a question, please don't share false information.
<</SYS>>
"""
# Throw together the query wrapper
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")
llm = HuggingFaceLLM(
context_window=3800,
max_new_tokens=800,
generate_kwargs={
"temperature": 0.5,
# "return_full_text":True,
"do_sample": True,
"repetition_penalty":1.1,
"top_p":0.7,
"top_k":50,
# "return_dict_in_generate":True,
},
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name=model_name,
model_name=model_name,
device_map="auto",
# change these settings below depending on your GPU
model_kwargs={"quantization_config":bnb_config},
tokenizer_outputs_to_remove=["token_type_ids"]
)
return llm
Note: Installed all the libraries when i run the llama idex code without ray serve it works perfectly.