RayServe with langchain

Rahil_Doshi · October 21, 2024, 10:25am

I have implemented Langchain Ollama service and deployed it with Rayserve. I am given @serve.deployment(num_replicas=3, ray_actor_options={“num_gpus”: 0.5}), so ideally it should create 3 instances of service, and should occupy 1.5 gpus, instead it is only creating 1 instance and occupying 0.5 gpu. I don’t know why it is happening. I want 3 instances of Langchain. Is it the case where Langchain is not allowing to load same model multiple times?

eoakes · October 22, 2024, 2:56pm

Hey @Rahil_Doshi there is probably some kind of configuration issue here. Could you share more about how you’re deploying Serve and could you also post the logs from the serve controller? (/tmp/ray/session_latest/logs/controller_<pid>.log)

Topic		Replies	Views
[Ray Serve] using GRPC and DAG to host multiple models(or actors) in the same deployment	3	421	February 2, 2023
Ray serve with dynamic deployments	0	576	September 23, 2022
Making Ray scheduler to Pack the workloads Ray Core	0	108	April 5, 2024
Scaling Ray serve with vLLM beyond 2 GPUs Ray Serve	1	2314	February 5, 2024
Resources allocation during serve deployment Ray Serve	5	662	December 3, 2022

RayServe with langchain

Related topics