Hi, I’m trying to use TensorRT LLM (GitHub - NVIDIA/TensorRT-LLM: TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.) to deploy my model using Ray. However, I’ve always been stuck on the installation part of the TensorRT, as it always produces errors related to mpi4py. While trying to compile the tensorrt-llm from source always resulted in errors on CUDA, such as libcud*.dll not found. Can anyone who has tried TensorRT LLM on ray help me out? Thanks.
I am running into similar issue. @rifkybujana were you able to get around that?
Nope, instead, I just use the officially released tensorrt support for Ray-llm.