Parallel Inferencing for multiple users

adiran · April 21, 2024, 4:24pm

Hi,

I am currently creating a SQL chatbot to conversate with my SQL data. I have used Ollama to run a local LLM and Flask to create the API to be used in the front-end. I have used llama-index to create a SQL query pipeline. I want multiple users using this API to conversate with the database. I am very new to ray and wanted to ask if ray can help with this requirement for multiple users and how, if possible? I think Ollama currently supports serial inference. How can ray’s features like batch inference help with this issue. I would really appreciate any advice on this.

Thank you.

Topic		Replies	Views
About the Ray Data LLM APIs category Ray Data LLM APIs	0	20	April 2, 2025
About the Ray Serve LLM APIs category Ray Serve LLM APIs	0	20	April 2, 2025
Ray Serve LLM APIs has 2~3x higher latency Ray Serve LLM APIs	7	196	May 19, 2025
OSS Chat, A Chatbot for Ray	1	763	April 21, 2023
:loudspeaker: We need your input! Ray State API (Observability) Dashboard, Monitoring & Debugging	1	441	November 14, 2022

Parallel Inferencing for multiple users

Related topics