Parallel Inferencing for multiple users


I am currently creating a SQL chatbot to conversate with my SQL data. I have used Ollama to run a local LLM and Flask to create the API to be used in the front-end. I have used llama-index to create a SQL query pipeline. I want multiple users using this API to conversate with the database. I am very new to ray and wanted to ask if ray can help with this requirement for multiple users and how, if possible? I think Ollama currently supports serial inference. How can ray’s features like batch inference help with this issue. I would really appreciate any advice on this.

Thank you.