Adding model compilation like TensorRT in offline inference

Monil · June 16, 2025, 7:38pm

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

Ray version:
Python version:
OS:
Cloud/Infrastructure:
Other libs/tools (if relevant):

3. What happened vs. what you expected:

Expected:
Actual:

Hi Ray Team,

I have a Ray job for offline inference workflow using Ray Data concepts like map_batches. I wanted to understand if I can add model compilation step before running the inference workflow. Suppose I have model, before triggering the flow, how can I apply model compilation framework like TensorRT, Apache TVM and use the compiled version of the model in the flow. I assume by default, Ray data does not provide any API for doing this.

Thank you

christina · June 17, 2025, 11:03pm

I do think you’re correct in that Ray Data doesn’t natively provide those APIs, but you should still be able to execute the workflow that you just described. I think generally, the workflow would be something similar to this:

Generate / Compile your model (any one of the ones you mentioned)
Load your compiled model in Ray Data via Ray Dataset
Use map_batches to apply the model to your data
Run your Ray Data workflow as usual

We have a tutorial here that might be helpful:

Topic		Replies	Views
Streaming data for training/evaluation/inference Ray Data	3	1333	May 3, 2022
BatchPredictors for TensorRT/AITemplate models	1	406	February 3, 2023
Keep PyTorch DataLoader when using Ray Data Ray Data	0	332	November 7, 2023
Can I use `compiled graph` feature in `Ray Dataset`?	1	35	November 25, 2024
Saving ray model to tf/pytorch Checkpointing, Restoring	0	298	August 11, 2023

Adding model compilation like TensorRT in offline inference

Related topics